Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The 10 Levels Study I

By Bill James

June 16, 2014

1. The Bad Warren Spahn

A quick observation about Warren Spahn: Did you ever notice that his worst seasons are actually his best seasons? Between 1949 and 1963, Spahn had only three seasons in which he didn’t win 20 games: 1952, 1955 and 1962. In 1955 he went 17-14 with a 3.26 ERA in 246 innings; he was third in the National League in Innings Pitched and fourth in ERA. . .well, OK, that’s a terrible season; there is no way to defend that one. But the other two "bad" seasons, 1952 and 1962, are not only as good as his other seasons, they’re arguably better. In 1952 he was 14-19, but had easily the best strikeout to walk ratio of his career, 183 to 73. Well, "easily". . .his ratio was almost as good in 1956, but with fewer strikeouts and walks. I have learned from Tom Tango that the most meaningful indicator is the strikeout to walk margin—183 to 73, a margin of 110. His best in any other season was 80. He pitched 290 innings, his fifth-best total (although he also pitched exactly 290 in two other seasons). He gave up only 19 homers, fewer than he had given up in any of the three previous seasons. His ERA, 2.98, was the same as it had been the previous season, when he had won 22 games.

That’s 1952, but what got me started on this was 1962. Spahn won 20 games in 1956, 1957, 1958, 1959, 1960, 1961, and 1963, usually going about 21-11. In 1962 he went just 18-14. But the 1962 season is not only as good as the other seasons; it’s actually better. Lee Sinins’ Complete Baseball Encyclopedia has a summary column called "RSAA", or Runs Saved Against Average. It is park- and league-adjusted. Spahn is credited with 24 RSAA in 1962—more than he had in any other season after 1956.

The reason I started this, though, was a different study, my own study; I am merely citing Sinins to establish that I might not be crazy. I have done a very careful study of every start in my data base—241,536 starts from 1952 to 2013—evaluating every start based on

1) The Season,

2) The Park,

3) The Opposition, and

4) The pitcher’s performance.

That is, whereas almost all evaluations of pitchers are based on season totals, implicitly assuming that the quality of opposition faced by one pitcher is the same as the quality of opposition faced by another, this evaluation looks at the teams the pitcher was matched up against. The conclusion of this approach is that Spahn did in fact pitch better in 1962 than in any other season in my data. In 34 starts in 1962 Spahn had 5 starts which score at "10" on a 10-point scale, 7 which score at "9", 3 which score at "8", and 6 which score at "7". 21 of 34 starts score at 7 or higher on a 10-point scale; only 8 score at 4 or lower. His average performance evaluation in 34 starts was 6.50—the highest of his career, in the data that I have—and he pitched well in 24 out of 34 starts, the highest percentage of his career.

2. Pedro Ramos, 1960

Pedro Ramos in 1960 finished 11-18 with a 3.45 ERA—about the same won-lost record that he had in 1958 (14-18), 1959 (13-19) or 1961 (11-20), although his ERA was better in 1960. According to this study, however, Ramos was actually the second most-valuable pitcher in the American League in 1960, behind another pitcher with a losing record, Jim Bunning. Bunning was 11-14 but with a great ERA and strikeout/walk ratio, so I’ve argued many times before that Bunning was the best pitcher in the league despite his 11-14 record. The argument about Ramos is new.

Ramos in 1960 made 36 starts, which led the American League. But what you wouldn’t know is who he started against. The four best offenses in the league in 1960 (park-adjusted) were the Yankees, the White Sox, the Orioles and the Indians; the four worst offenses were the Kansas City A’s, the Tigers, the Red Sox and the Senators. Ramos started 23 times against the "good" offenses, 13 times against the weak offenses. If you weight the won-lost record of each team by the number of starts Ramos made against them, the average quality of his opposition is .531. When you take that into account. . .he’s very nearly the best pitcher in the league.

3. Explaining My Study in Very General Terms

I called this the 10 Levels Study, but there were actually 11 levels. I sorted every start by every pitcher into one of 11 groups: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or 0. There were the same number of starts at every level, so an average start was exactly 5.000000.

Not only was the average start at 5.00000, but an average start in every season was very near to 5.00. An average start in 1968, when the National League ERA was 2.98, would be very close to 5.00, and an average start in 2000, when the league ERA was 4.64, would be close to 5.00, because I adjusted for the runs scored in the league. An average start in Fenway Park would be about 5.00, and an average start in a pitcher’s park would be about 5.00, since I adjusted for the ballpark. And an average start against the Big Red Machine in 1976, or against the Yankees in 1998, would be about 5.00, and an average start against the 1962 Mets would be about 5.00, because I adjusted for the quality of the opposition’s offense.

Then I counted the number of times that each pitcher pitched at a "10" level, a "9" level, etc. There were 21,958 starts in each group, except that the "10" level and the "0" level only had 21,957, because 241,536 is not evenly divisible by 11.

4. I Never Knew That

This is what I had never understood, until doing this study. . ..and it is absolutely amazing that I never understood this, because it is an extremely fundamental truth about the game, which I had somehow unaccountably missed up until this point. Dominant pitchers almost never actually have bad games. I never knew that. Guys like Koufax, Carlton, Gibson, Pedro, the Big Unit, Gooden when he was good. ..they almost never actually have bad games. They lose sometimes, because sometimes they run up against another pitcher having an equally good day, and sometimes they give up a few runs because they may be pitching against a good team in a good hitter’s park or something. But in terms of just having a bad day. . .they almost never do. Their Good Game/Bad Game percentage is actually very close to 1.000.

Suppose we count every start that scores at "6" or above as a "Good Game", and every start at "4" or below as a "Bad Game", and every start at "5" as a No Decision. An average Good Game percentage is .500. Randy Johnson in 1997 was 25-0. He did have four no-decisions—four games that scored at "5" on a 10-point scale—but no actual bad games.

Randy was the only pitcher who was perfect in more than 12 starts, but Pedro Martinez in various seasons was 27-1, 27-1, 28-3 and 24-4. Randy was 31-1 in 2001. Bob Gibson was 30-1 in 1968 and 30-3 in 1969. Greg Maddux in different seasons was 24-2, 23-2 and 25-4. Another pitcher, who I will discuss later, was 34-1 in one season; in other seasons he was 35-5 and 21-3. Sandy Koufax in his big four seasons was 32-8, 23-2, 35-5 and 34-5. Sometimes these guys lose, but it’s not because they don’t pitch well. They pitch well every time out.

5. Starting on Two Days Rest

While I was doing this, I started wondering about the disappearance of pitchers starting on short rest. . ..when exactly did that become not an option?

Since I had the data in front of me, I decided to count the number of times a pitcher started on two days rest (or less. Wilbur Wood once famously started both games of a double-header, and Al Santorini, less famously, also did that, May 26, 1971; he just faced one batter in the first game.)

Anyway, I counted pitchers starting on two days rest or less. This happens most often, I discovered, after a pitcher has been knocked out early in the previous start. Also, I discovered that it is very difficult to determine exactly what constitutes being knocked out early in your previous start.

Let’s start with a simple, unadorned count of how many pitchers started on two days rest or less, by season:

	0	1	2	3	4	5	6	7	8	9
195-			70	71	62	52	79	64	58	43
196-	56	62	64	47	58	64	61	51	39	57
197-	44	44	81	95	26	36	31	32	26	13
198-	14	5	19	7	8	12	12	8	7	5
199-	16	6	3	8	4	10	5	5	5	3
200-	7	4	3	3	3	3	2	1	2	2
201-	1	3	2

The surge in these numbers in 1972-1973. ..that’s the Chuck Tanner blip. I had kind of forgotten that entire episode; I was in the Army when it happened, and not able to follow the season with my usual energy. But Chuck Tanner with the 1972-73 Chicago White Sox went to essentially a three-man rotation, starting Wilbur Wood, Stan Bahnsen and Tom Bradley frequently on two days’ rest (Wood, Bahnsen and Steve Stone in 1973). Bradley started 8 times on two days’ rest in 1972, none in 1973; Bahnsen started 8 times on two days’ rest in 1972 and 8 times in 1973, and Wilbur Wood started 14 times on two days’ rest in 1971, 25 times in 1972 and 19 times in 1973. Steve Stone in 1973 started six times on short rest.

It wasn’t all Tanner; it was in the air. Starting pitchers on short rest had been on the way out since my data set begins in 1952, but it came back, for a couple of years. Paul Splittorff started four times on short rest in 1973, Phil Niekro 9 times; Mickey Lolich did it 3 times in 1971, 2 in 1972, and 3 in 1973. Blyleven started 3 times on short rest in 1972.

That was the last gasp of that strategy; it died quickly after that. Above I gave you the counts of pitchers starting on two days’ rest (or less), but about half of those are pitchers who were knocked out early in their previous start. . ..not Wilbur Wood, obviously, but many of the others. These are the counts of pitchers starting on two days’ rest or less after having faced 15 or more batters in the previous start:

	0	1	2	3	4	5	6	7	8	9
195-			46	46	34	29	42	31	28	20
196-	26	28	36	23	26	26	33	29	23	26
197-	23	31	63	78	15	19	16	11	8	3
198-	7	1	6	3	3	5	4	1	2	2
199-	6	2	1	2	2	3	2	2	1	1
200-	1	0	0	0	0	0	0	0	0	0
201-	0	0	0

If you work the data, you can see that in the 1950s, 45% of pitchers who started on two days’ rest or less had been knocked out early in their previous starts. In the 1960s this increased to 51%. In the 1970s, because of the Chuck Tanner blip, it decreased to 38%; Tanner was not starting pitchers on two days rest after they were knocked out early, he was doing it because he had very few good pitchers. But in the 1980s, 65% of pitchers who started on short rest had been knocked out early in their previous start; in the 1990s, 66%. Since 2001, every pitcher who has started on two days’ rest has been knocked out early in the previous start.

6. Stieb and Spahn

It is true, but sad, that wins play a huge role in deciding who gets into the Hall of Fame. That's why Dave Stieb, for instance, who was more valuable in his four or five best seasons tha(n) Warren Spahn was, will never get anywhere near the Hall.

David Kaiser

(Commenting on Dave Fleming article, Defending the Win)

Well. . .I would sort of two-thirds agree with that; maybe three-fourths. Dave Stieb, according to this study (and in concert with many previous studies) was probably the best starting pitcher in the American League in three consecutive seasons, and may have deserved as many as four Cy Young Awards. He won none. He was the best starting pitcher in the American League in 1983; the Cy Young Award went to LaMarr Hoyt, at that time not yet a criminal. He was the best starting pitcher in the league in 1984; the Cy Young Award—and the MVP—went to a reliever, Willie Hernandez. He was the best starting pitcher in the American League in 1985; the Cy Young Award went to Bret Saberhagen.

My method does not show Stieb as the best pitcher in the American League in 1982; it ranks him third, behind Floyd Bannister and Rick Sutcliffe. Still, Stieb may have deserved the Cy Young Award that year, too; certainly he was better than Ugly Pete, who won the award, and he does rank first by other methods, and good methods. In 1981 Stieb was probably the fourth-best starting pitcher in the American League, behind Steve McCatty, Dave Righetti and Jack Morris; the Cy Young Award—and the MVP—went to a reliever, Rollie Fingers. After 1985 Stieb was not the same, although he was about the 6^th-best starting pitcher in the American League in 1990.

I have to agree, then, that Stieb was the best starting pitcher in the American League in his era, and I have to agree that he was denied Cy Young Awards that he deserved because of his won-lost records. I will also agree generally that the influence of won-lost records on award voting has been excessive and pernicious; in that debate I switch teams more often than Edwin Jackson, but at the moment I am on the team opposed to Won-Lost records.

I also have to agree, reluctantly, that Stieb was more dominant in his best years than Warren Spahn was, at least in the data that I have. The best season that Spahn had, in my data, was 1962. I have several different indicators of overall value coming out of this study, but Stieb beats Spahn’s best seasons by every indicator in 1983, 1984 and 1985, and by some indicators in 1982 as well. Stieb was in fact more dominant than Spahn, at least the Warren Spahn covered in my data.

Stieb was more dominant than Spahn, but not much more dominant than Spahn; he was better than Spahn by a thin margin, whereas Spahn was outstanding for more than twice as long. Stieb was more dominant than Spahn, but his level of dominance does not approach that of many other pitchers in the data—Seaver, Clemens, Gibson, Pedro, Big Unit, Carlton, etc.

But as to the suggestion that Stieb has been denied Hall of Fame consideration because of his Won-Lost records. . .absolutely not. Dave Stieb does not remotely approach the standard of a Hall of Fame pitcher, and it would be absurd for him to be considered for that honor. Stieb was the best pitcher in the American League at a time when the American League was desperately short of outstanding pitchers.

Stieb made 412 starts in his career, which is a short career for a Hall of Fame candidate. There are Hall of Famers in that range and below, like Koufax and Dazzy Vance, but those are pitchers who were much more dominant than Stieb was. Pedro Martinez and Roy Halladay are deserving Hall of Famers in about that number of starts. But in his 412 starts, Stieb is extremely comparable to four other pitchers who had about the same number of starts: Larry Jackson, Kevin Appier, Mark Langston and Steve Rogers. Let’s look first at games started:

Mark Langston, 428

Larry Jackson, 420

Dave Stieb, 412

Kevin Appier, 402

Steve Rogers, 393

Larry Jackson actually made 429 starts, not 420; I am missing data for nine starts from early in his career. Anyway, let’s break those down into "Good Starts" and "Bad Starts". Stieb in his career had 231 "Good Starts"—considering where he was pitching and who he was pitching against—153 "Bad Starts", and 28 starts which are kind of neutral. All of these other pitchers had similar data:

First	Last	Good	Poor	Good Start Pct
Kevin	Appier	221	142	.609
Mark	Langston	234	153	.605
Dave	Stieb	231	153	.602
Steve	Rogers	222	153	.592
Larry	Jackson	217	151	.590

That chart reduces to three columns the data that I took great pains to produce in eleven columns, so let’s deal with all eleven. In the chart below the "10s" are the outstanding starts, and the "0s" are the extremely poor starts:

First	Last	10	9	8	7	6	5	4	3	2	1	0
Kevin	Appier	59	49	42	35	36	39	34	43	22	14	29
Mark	Langston	65	47	45	39	38	41	30	31	35	24	33
Dave	Stieb	56	60	46	36	33	28	26	50	24	36	17
Steve	Rogers	61	39	52	42	28	18	45	35	28	24	21
Larry	Jackson	61	45	41	34	36	52	26	39	26	26	34

That chart in that form doesn’t show you very much; it is merely a necessary step toward the analysis that will follow. tieb had no more outstanding starts than these other pitchers, although he did have fewer truly terrible starts. Let’s look at the average start value for these pitchers:

First	Last	Count	Total	Average
Steve	Rogers	393	2294	5.84
Kevin	Appier	402	2346	5.84
Dave	Stieb	412	2396	5.82
Mark	Langston	428	2446	5.71
Larry	Jackson	420	2356	5.61

No indication there that Stieb is better than the other guys. To the best of my knowledge that’s actually a good way to look at the data. Each step up in this data represents about the same gain in wins for your team as every other step up; therefore a simple average is a valid way to process the data, although it is not the only option. Another way to look at the data is to compare the players to the average, which is 5.0000:

First	Last	Count	Total	Average	Plus
Kevin	Appier	402	2346	5.84	336
Dave	Stieb	412	2396	5.82	336
Steve	Rogers	393	2294	5.84	329
Mark	Langston	428	2446	5.71	306
Larry	Jackson	420	2356	5.61	256

Appier and Stieb are each 336 "steps" above an average pitcher, in their careers. We could also do Performance Above Replacement, assuming that a Replacement-Level pitcher would generally deliver about a "3":

First	Last	Count	Total	Average	Plus	Above Replacement
Mark	Langston	428	2446	5.71	306	1162
Dave	Stieb	412	2396	5.82	336	1160
Kevin	Appier	402	2346	5.84	336	1140
Steve	Rogers	393	2294	5.84	329	1115
Larry	Jackson	420	2356	5.61	256	1096

Dave Stieb pitched most of his career for very good teams; he actually pitched for much better teams than these other four pitchers. Stieb came up in 1979; the Blue Jays got to about .500 in 1982, and were always over .500 after 1982. Appier spent all of his best years pitching for Kansas City Royals teams that were almost universally miserable. Rogers, like Stieb, came up with an expansion team, but Rogers pitched for losing teams for the first six years of his major league career. Larry Jackson pitched for the Cardinals while they were in a down phase—they had losing records in 1955, 1956, 1958 and 1959—and, when the Cardinals started to improve, was traded to the Cubs, where he pitched for losing teams in 1964, 1965 and a 100-loss team in 1966. Langston came up with the Mariners in 1985, and pitched for five years (1985-1989) for Seattle teams that were never within 10 games of .500, pitched one year for a .500 team in Montreal, then pitched the second half of his career with Los Angeles Angel teams that were usually under .500 and never very far above .500. (They were under .500 in 1990, 1992, 1993, 1994 and 1996. They were exactly .500 in 1991, and a few games over .500 in 1995 and 1997.

Dave Stieb, as a Hall of Fame candidate, is no better than Kevin Appier, Mark Langston, Steve Rogers or Larry Jackson, and there are other pitchers in that class—pitchers less perfectly comparable to Stieb, but still generally comparable to Stieb. That group includes Doc Gooden, John Candelaria, Jimmy Key, Orel Hershiser, Andy Benes, Javier Vazquez, Milt Pappas, Fernando Valenzuela and David Wells. These are the Good Start/Poor Start ratios for those pitchers:

First	Last	Count	Good	Poor	Good Start Pct
John	Candelaria	356	200	120	.625
Dwight	Gooden	410	234	141	.624
Jimmy	Key	389	220	138	.615
Orel	Hershiser	466	262	165	.614
Andy	Benes	387	210	137	.605
Dave	Stieb	412	231	153	.602
Javier	Vazquez	443	231	175	.569
Milt	Pappas	462	241	185	.566
Fernando	Valenzuela	424	213	170	.556
David	Wells	489	232	205	.531

OK, Javier Vazquez is a special case; there are unique problems with his career, so let’s take him out of the group so that we don’t get hung up arguing about him. This chart gives the Average Performance Levels, the .500 Plus numbers, and the "Above Replacement Level" numbers for this group:

First	Last	Average	Above .500	Above Replacement
Dwight	Gooden	5.84	344	1164
Dave	Stieb	5.82	336	1160
John	Candelaria	5.72	257	969
Jimmy	Key	5.67	262	1040
Andy	Benes	5.59	230	1004
Orel	Hershiser	5.57	266	1198
Milt	Pappas	5.46	214	1138
Fernando	Valenzuela	5.38	163	1011
David	Wells	5.31	150	1128

But there is another class of pitchers who are distinctly above these pitchers, and who aren’t going to go into the Hall of Fame, either. That class includes David Cone, Kevin Brown, Bret Saberhagen, Luis Tiant, Mickey Lolich, Chuck Finley and Vida Blue. Let’s begin by looking at their Good Start/Poor Start ratios:

First	Last	Starts	Good	Poor	Good Start Pct
David	Cone	419	260	120	.684
Kevin	Brown	476	286	145	.664
Bret	Saberhagen	371	211	122	.634
Luis	Tiant	484	275	170	.618
Dave	Stieb	412	231	153	.602
Mickey	Lolich	496	276	183	.601
Chuck	Finley	467	249	168	.597
Vida	Blue	473	254	179	.587

Cone, Brown and Tiant made more starts than Stieb and with a higher percentage of effective starts. Saberhagen made 10% fewer starts but with a significantly better percentage of good starts, while Lolich, Finley and Blue made significantly more starts with a percentage only slightly lower. Let’s look at the averages:

First	Last	Average
David	Cone	6.24
Kevin	Brown	6.04
Bret	Saberhagen	6.03
Luis	Tiant	5.89
Dave	Stieb	5.82
Mickey	Lolich	5.78
Vida	Blue	5.70
Chuck	Finley	5.68

The margin above average:

First	Last	Above .500
David	Cone	518
Kevin	Brown	495
Luis	Tiant	432
Mickey	Lolich	387
Bret	Saberhagen	382
Dave	Stieb	336
Vida	Blue	333
Chuck	Finley	316

And the margin above replacement level:

First	Last	Above Replacement
Kevin	Brown	1447
Luis	Tiant	1400
Mickey	Lolich	1379
David	Cone	1356
Vida	Blue	1279
Chuck	Finley	1250
Dave	Stieb	1160
Bret	Saberhagen	1124

It was suggested in Dave Fleming’s recent article that David Cone pitched disproportionately against sub-.500 teams, but this method fully and absolutely takes that issue off the table, by adjusting each start for the quality of the competition. I would vote for any of this last group of pitchers for the Hall of Fame before I would even consider voting for Dave Stieb.

7. More About How the Method Works

Each start by every starting pitcher is evaluated essentially by the Game Score, and by a Compensation Score that gives the pitcher more credit for pitching in a high-ERA league than in a low-ERA league, more credit for pitching in a hitter’s park than in a pitcher’s park, and more credit for pitching against a strong offense than for pitching against a weak offense. The sum of these adjustments is called the Compensation, and the total of the Game Score and the Compensation Score is called the Package Score.

There’s a little more to it than that; in some cases I didn’t trust the Game Score system, either. There are a couple of "outs" or "dodges" in the system; I’ll explain those later. First, let me explain the Compensation Scores. To begin with, I had to study (again) how Game Scores vary with the Run Environment. What is the normal Game Score in a 5.50 Run Environment? What is it in a 2.75 Run Environment?

Well, that’s not the first set of questions I had to deal with. The first set of questions that I had to deal with was, "What is the run-scoring level for each league? What is the Park Factor for each Park? What is the (Park-Adjusted) strength of each team’s offense? I had to find or create answers to each of those questions, then I had to put those answers into the data file that had the records of each game, then I had to combine the League Run Level and the Park Adjustment into a single number so that I could evaluate the "Run Context" for every game. Once I had done that, then I could figure the normal Game Score for every Run Context.

My conclusion: A Game Score of 50.000 is the norm at a Run Context Level of 4.68 Runs Per Game.

For each Run Per Game above or below 4.68, there is a Gain of 3.5 points in the Average Game Score, obviously running inversely to the Run Level. In other words, at a run context of 3.7 runs, the average Game Score is 53.5; at 5.7 runs per game, the average Game Score is 46.5.

The expected Game Level at any run level can be approximated by the formula (10 minus Expected Runs) * 3.50 + 31.5. In other words, if the expected Run Level is 7.00, the expected Game Score is 42. If the expected Run Level is 6.00, the expected Game Score is 45.5. If the expected Run Level is 5.00, the expected Game Score is 49. If the expected Run Level is 4.00, the expected Game Score is 52.5. If the expected Run Level is 3.00, the expected Game Score is 56.0. If the pitcher pitches better than expected, that’s a Good Start.

Expected Run Levels in the study go as high as 10.7 runs, and as low as 2.045 runs. That 10.7 runs sounds crazy, but. . ..do the math. That’s the number for Coors Field in the year 2000, facing the San Francisco Giants. The National League Runs Per Game in 2000 were 5.004 per game. The Park Factor for Colorado was 166, meaning that the Rockies scored and allowed 66% more runs per game at home than they did on the road.

We have to modify that figure slightly to recognize that the Rockies were the only National League team which did not have Coors Field among their "road" venues, so the functional park adjustment for Colorado is not 1.660, but 1.587; it actually expanded offense relative to the league average not by 66% but by 59%. Still, that expands the offensive context for Colorado, 2000, to 7.944 runs per team per game.

The best offense in the league was the San Francisco Giants, led by Barry Bonds and the MVP, Jeff Kent. Three regulars on that team had an OPS over 1.000 (Bonds, Kent and Ellis Burks)—and they played in what was almost the worst hitter’s park in the league, with a park factor of 83. Park-adjusted, the Giants’ offense was about 36% better than the league average; let’s call it 35.88%,. So if you expand 7.944 runs per game by 35.88%, you’ve got 10.794 runs per game.

I see that my data file has them at "only" 10.67 runs per game, so I must have made some other little adjustment somewhere that I have now forgotten. In real life, the Giants only scored 8.29 runs per game in Coors’ Field in 2000—but then, they LOST six of the seven games. Let me repeat that: The Giants in 2000 scored 8.29 runs per game in Coors Field—and lost six out of seven games. The Rockies outscored them by four runs per game, at 12.29.

If you pitch a 2-hit shutout, that’s a great game in any environment; if you give up 7 runs in 4 innings, that’s a lousy game in any environment. Between the extremes, though, the Run Environment plays a very significant role in how the performance is evaluated.

One game in 11 is evaluated by this system as a "10", meaning that, under the circumstances, this pitcher pitched a hell of a game. 21,957 games are evaluated as "Ten" performances. Remember now that I am not talking about what is normal or typical; I am talking now about what is extremely atypical, extremely unusual.

The worst Game Score that was evaluated a "10", in this study, was a game pitched by Pedro Astacio of the Rockies in Coors’ Field on May 10, 1999, not against the Giants but against the Mets, also a good-hitting team in 1999. Astacio pitched 8 innings, giving up 7 hits, 3 runs, 3 earned runs, 7 strikeouts, 5 walks. That’s a Game Score of 58, and a Game Score of 58 would ordinarily be evaluated as a "6" or a "7". But facing that team, in that park, in that season, that’s a tremendous effort. The Rockies won the game 10 to 3.

On the other side of the coin, Ken Forsch of the Houston Astros in 1971 had a game in which he went 8 innings, gave up 9 hits, 2 runs, 2 earned, 2 walks, 5 strikeouts. Superficially that is a better game than Astacio’s, with a Game Score of 59—but in this study it is marked down as a "5"—a neutral start under of the circumstances. The National League average in 1971 was less than four runs per game (3.91). The Astrodome was by far the best pitcher’s park in the league, reducing offense by another 17, 18%. The Padres had a terrible offense, losing 100 games and being last in the National League in runs scored by more than 50. Under those circumstances, a Game Score of 59 is a very humdrum performance.

Two more things. I told you that, in this study, I didn’t always trust the Game Score system. I modified the results above in two ways. First, if a pitcher pitched nine innings or more than nine innings in a game and allowed no runs, earned or un-earned, that was always scored as a "10", no matter what the other details were.

I believe in the Game Score system; that is to say, I believe it is appropriate to look at the details of the performance, in asking how well a pitcher has pitched, as well as looking at the bottom line. But a shutout in baseball is an absolute, and absolutes are not relative. If you give up no runs in a game, you absolutely cannot lose. If you pitch 9 innings in a game, you absolutely cannot be expected to do more. In a 9-inning game, you can’t do better than pitch a shutout; you can do it more impressively, but you can’t do any better.

On a zero-to-ten scale, I just would not be comfortable scoring a shutout as anything other than a "10", no matter what the other details were. The least impressive shutout in my data was a game pitched by Dave Freisleben of the San Diego Padres on May 29, 1976. Freisleben gave up 10 hits in that game, walked 4 batters and struck out only 3. The game was pitched in a league in which the ERA was low—3.50—and in a park in which the Park Run Factor was 76, which is extremely low. The game was pitched against the San Francisco Giants, a poor hitting team in 1976. Altogether. .. .not that impressive.

Still, the man pitched a shutout. You can’t do any better. This rule—that a shutout has to be scored as a "10"—has little practical effect in our system, because

a) The overwhelming majority of complete game shutouts (88%) would be scored as "10s" anyway, even without this rule, and

b) Those which wouldn’t be scored as 10s would almost all be scored as "9s".

Getting rid of this rule would have very limited effect on how pitchers were evaluated; it would merely make small changes in how 4/10ths of one percent of the games were evaluated. I just wouldn’t be comfortable not giving a pitcher who pitched a shutout a solid A.

The other little retreat from Game Scores has to do with pitchers who are knocked out of the game early. In the Game Score system, a pitcher starts the game at "50", and goes up or down depending on whether he gets outs or gives up hits, walks and runs. Given the general goal of the Game Score system, to mirror the impact of the pitcher’s performance on the won-lost expectation of his team, that seems like the right approach.

For this particular purpose, it seems problematic. If a pitcher takes the mound, strikes out the first batter, and then has to come out of the game, that’s a Game Score of 51. If he walks the first batter and then has to come out of the game, that’s a Game Score of 49.

But when a team puts a pitcher on the mound, they’re counting on him to get some outs. If he doesn’t get anybody out, that’s not an "average" performance; that’s a disappointing turn of events.

I decided, in this study, to mark down pitchers who are knocked out of the game in the first five innings, by subtracting from them one point for every out they record, short of 15 outs. The pitcher who strikes out the first batter and then comes out of the game, who otherwise would be at 51, which might be a Level 5 game, now goes down to 37, which would be more likely to be a level 3 game. The pitcher who walks the first batter and then comes out drops from 49 to 34.

Again—not a huge impact on the system. This rule affects many more games than the other rule. The "shutout" rule affects 1,078 games. The "short outing" rule affects 49,177 games, although it probably only changes the rank of a small percentage of those. Most of the pitchers who score very poorly by Game Score were knocked out of the game early. When we discount their performance, we are also discounting most of those who would score about the same, then we rank them in comparison to one another, so the net effect is very limited, much like a double foul in basketball. Still, I’m not comfortable with the possibility of a pitcher being credited as pitching a good game when he may not have gotten anyone out. I put in this rule to eliminate that possibility.

COMMENTS (31 Comments, most recent shown first)

hotstatrat
I second Dan's sentiments - I loved those biographies you wrote, Bill, in the early 1990s. I wish you could duplicate yourself, so you could get one of yourselves back on that project.
2:17 AM Jun 19th

KaiserD2
sorry, I inadvertently interrupted myself--two guys with the same RAA--really RBA, or Runs below Average--would show in baseballo reference with the same WAA, even if one pitched 50-70 innings more than the other. That didn't make any sense to me. I consistently show more WAA than they do for guys with heavy work loads, yes. But to me, that makes perfect sense.
DK
4:06 PM Jun 18th

KaiserD2
I am not assuming the guy pitches nine innings. I'm simply recognizing that every nine innings = one game. What I immediately noticed working with the baseball-reference data was that by their method, two guys with the same RAA (
4:04 PM Jun 18th

tangotiger
"Then I do something baseball-reference clearly does not do: I multiply that percentage, minus .5, times (IP/9) for that pitcher, to get a figure for his wins above average. I hope that’s clear. "

Well, they UNCLEARLY do do that! Or at least some variation thereof. That's what WAA is. Spahn is at 6.6 in 1947 and 6.2 in 1953 and 4.7 in 1951 and 3.9 in 1952. So, it agrees with the 4 rankings, but just not to the same degree as you do.

***

The difference is you assume the pitcher will pitch 9IP each game in order to get the pythag win%, whereas BR.com (and me and others) will use the pitcher's actual IP/GS, and pad the rest with league average performance. And then multiply by GS rather than IP/9.

Just different ways to get there...
10:50 AM Jun 18th

KaiserD2
OK, I decided to get this out of the way now.

My method for evaluating pitcher’s seasons is based upon data in baseball-reference.com, although I adapt it somewhat. They have a stat which answers the question, for a given pitcher (say, Warren Spahn), in a given season, how many runs would an average pitcher, with the same defense behind him, pitching against the same opposition in the same ballparks, have allowed in the number of innings he pitched? Then they give the number of runs the pitcher actually allowed. Those two numbers, for me, provide the basis for computing a Pythagorean won-loss percentage for that pitcher for that year. Then I do something baseball-reference clearly does not do: I multiply that percentage, minus .5, times (IP/9) for that pitcher, to get a figure for his wins above average. I hope that’s clear.
Now based on that method, Warren Spahn’s best seasons are 1947 (7.7 WAA), 1953 (7.3), 1961 (5.3), and 1952 (4.2). Because Bill started his study in 1952 he missed two of those seasons. After that, however (using the calculations from baseball-reference.com, which differ a little from mine in some cases) Spahn reached 3 WAA only once, in 1962. He had a long string of seasons with between 1 and 2.6 WAA, including all the years in which the Braves won or contended for the pennant. That kind of pattern, by the way—a few really super seasons early in a career, followed by a steady string of 1-3 WAA seasons—is very common for HOF pitchers.
Now Dave Stieb’s best seasons, by this method, are 1984 (6.1 WAA), 1985 (5.3), 1982 (5.0), 1983 (4.9), 1990 (4.1), and 1981 (if you project his 2.9 WAA out to 162 games, you’ll get about 4.2 WAA) . He next two best seasons were 2.9 and 2.8 and then he drops below two. In other words, Stieb’s three best seasons weren’t as good as Spahn’s were (their third best is a virtual tie), but his next five were significantly better than Spahn’s next five. That’s why I said he was more dominant in his best years and I stand by that statement (although I may have slightly overstated it.)
I cannot take the time to compare all the pitchers whom Bill compared to Stieb by my method. I can however tell you that there are many pitchers in the HOF whose 8 best seasons, in terms of WAA, are not nearly as good as Stieb’s best, but who had the good fortune to play for dynasties, and to have much longer careers. I do think that Stieb would have had a good chance for the HOF if he could have won 200 games, but as it is, he has none, and I continue to believe that is unfortunate.

I look forward to the rest of the series.

9:40 AM Jun 18th

KaiserD2
I just wrote a long comment explaining my own views on Stieb and Spahn (I was gratified, obviously, that Bill decided to look into it and that his conclusions were similar.) When I tried to post it I got an error message and found the whole comment was lost in cyberspace. I don't have time to do it again right now, but I'll try to do it tomorrow.

best, DK
8:41 AM Jun 18th

belewfripp
When you account for both the league run environment and the offensive prowess of the opponent, aren't you double-counting? In other words, one of the reasons why the NL run scoring average in 1971 was 3.91/game was because the Padres stunk on offense? And so factoring both in is taking the Padres' ineptitude into account twice? Granted, there were many other teams whose performance create that 3.91 average, but shouldn't the run environment be considered the environment excluding each game's opponent if another adjustment is then going to be made for the specific opponent in each game?
9:47 PM Jun 17th

tangotiger
I alway like to start my "recent" baseball history with players born since 1931 (Mays/Mantle). I think that's a useful line, as there are few superstars born within a few years before that, but plenty born after that. Racial integration is in full swing.

And on top of two of the greatest ballplayers of all time, there was also Ernie Banks and Eddie Matthews and Jim Bunning, HOF. And Ken Boyer, borderline HOF-type. I can't tell you how many times I've used 1931 as my demarcation line. I didn't have to look up any of these facts, because I've used it so often.

And just this very second, I learned someone else born in 1931: Larry Jackson.

8:11 PM Jun 17th

OldBackstop
Never heard of Larry Jackson, but I see that the day after I was born, Duke Snider really made him his bitch. On March 28, 1961, Duke took him deep at Dodgertown, then hit a double off him, then broke his jaw with a broken bat, and made him wear this silly hat around all the scrubs for the rest of spring training:
news.google.com/newspapers?nid=1129&dat=19610328&id=JJ5RAAAAIBAJ&sjid=PmwDAAAAIBAJ&pg=4745,6775951
7:24 PM Jun 17th

mskarpelos
A clarification on Bill's earlier comment. Larry Jackson was never a member of the US Congress; he was a member of the Idaho state legislature. He also died young (just 59 years old in 1990) which may be another reason many fans don't remember him. I became a fan when I was 8/9 years old back in 1969, so he retired a year before I could have seen him. I'm with Tango. I should have known about him, but until I looked him up today, I had no clue who he was.
6:37 PM Jun 17th

those
Bill, do you know why Ramos was able to pitch so many innings? One year in the 1950s, between MLB and winter ball, he was over 400.
3:09 PM Jun 17th

tangotiger
It sounds like Ken Dryden, for those of you who follow hockey. A HOF player who retired at his peak.
1:06 PM Jun 17th

bjames
Jackson retired to pursue his second career.
8:50 AM Jun 17th

cderosa
I’m with Tango: I became baseball-aware in the 1970s, have been deeply interested in baseball history ever since, and yet only first heard of Larry Jackson in the New Bill James Historical Abstract.

This is funny to me because I actually did this back in 2001, when that book came out: I jotted down Bill’s highest rated player at each position about whom I knew nothing, other than maybe my eyes passed over their names on lists of baseball players, and looked them up. This is what I came up with:

C Duke Farrell, #45
1B Elbie Fletcher, #46
2B Tony Cuccinello, #53
3B Buddy Lewis, #40
SS Freddy Parent, #63
LF George Stone, #63
CF Mike Griffin, #45
RF Tommy Holmes, #58
P Larry Jackson, #89

And when I start to think about how I acquired knowledge of baseball history, it becomes easy to see how Jackson, at least, slipped through the cracks for me.

My introduction to the outline of baseball history were books like John Rosenburg, The Story of Baseball, and some Ritter/Honig books. I still have the Rosenburg book, and Larry Jackson is not in the index. Larry Jackson appears only once in the four editions of the Fireside Book of Baseball (vol. 3, he’s playing cards with Jim Brosnan). Jackson is not mentioned in the first Historical Abstract from 1985, and I can’t remember him mentioned in any of Bill’s annuals. Although Bill mentions that he was instrumental in establishing the Player’s Association, Larry Jackson does not appear in the index of Marvin Miller’s autobiography.

I’m a Yankee fan. As a National League guy on also-ran teams, Jackson isn’t in any Yankee stories: never pitched for the Yankees, against the Yankees, or featured in any fist fights with Billy Martin. So he’s not in any of the couple of dozen books I’ve read about the Yankees.

Even when I lived in Philadelphia and followed the Phillies every day, the broadcasters didn’t tell Larry Jackson stories.

I started collecting baseball cards in 1974, at age 5. No Larry Jackson baseball cards.

I played Strat-O-Matic baseball and grabbed any old cards I could find at garage sales and such. No LJ.

At one point, I memorized lists of award winners. Larry Jackson didn’t win awards.

I loved looking at the rosters of the great teams and committed them to memory. I liked to read about great World Series contests and famous pennant races. Negative Larry Jackson. He was on the Cardinals before their three Sixties pennants, on the Cubs before their 1969 division title, and on the Phillies after their 1964 collapse.

I liked (still like) to debate who was the greatest this or that of all time. Jackson wasn’t good enough to be in those discussions.

My best chance of encountering Larry Jackson was by flipping around the Big Mac, which I liked to do sometimes. I never landed squarely on him, I guess. Maybe not enough black ink or a colorful enough name to catch my eye.

Just as a consequence of time, place, and available modes of inquiry, it’s possible (inevitable, really) to have big blind spots even about a subject in which you are really immersed. That’s why Bill’s take on expertise has always spoken to me. I like to tell my own students I’m not an “expert” on what we are studying, but, like them, a student of the subject. There’s always more out there.

By the way, for those who do remember him, why did Larry Jackson hang it up after hurling 243 innings of better-than-average ERA baseball? Hurt? Didn't want to play in Montreal?

7:19 AM Jun 17th

tangotiger
Dan: I would suggest that someone who goes through the historical records as much as I have would have far more exposure than the typical baseball fan who learns of baseball history through books and wikis.

And you made the point for me: an organization dedicated to preserving baseball history decided to write at least 3000 biographies and still not be able to get to Larry Jackson.

Your n=1 and my n=1 experiences mean nothing in the face of that.
6:06 AM Jun 17th

OldBackstop
Excellent work, Bill. I have always lusted for something along the lines of a "Saber" Quality Start, and it seems like you have done all the heavy lifting here. Now we need to brand it....I'm thinking "AWESOME": Almost Without Earned Scores Or Many Errors......erm...well, it's a start...
12:44 AM Jun 17th

danfeinstein
Tom ~ I'd argue that great players are well remembered; it's the average, or a little better / worse that get forgotten quickly. Just at dinner tonight, I had a 30 minute conversation (with a former Secretary of the Navy) that was highlighted by a lengthy comparison between Robin Roberts, Frank Tanana, and Steven Strasburg. If two random people sitting at dinner can talk about Frank Tanana, a non-HOF player whose best years are more than 35 years back, I don't fear for good but not great players disapearing into the ether. In the Ask Bills right now, there is a lot of talk about areas of expertise. For better or worse, it seems that yours is much more on the analytical side than the historic; I would suggest that someone with 1/10th as much historic knowledge as you have of the analytic - even one of your generation - would well know Larry Jackson.

Bill ~ One of the things I most enjoyed about your writing was the relatively brief period when you wrote a bunch of short biographical essays in the Baseball Books of 1990-92 and several more in the NBJHBA. I'm not ballsy enough to tell you what to write about, or certainly not suggest what is commercially viable, but did want to let you know that was an entry point for my interest in more adult oriented biographies of the players and wished that you hadn't stopped.
8:46 PM Jun 16th

evanecurb
And his brother Reggie hit over 500 home runs.
5:12 PM Jun 16th

rgregory1956

Probably just me, but shouldn't the Category 5s be counted as half a win/half a loss when figuring the Good Start Percentage?

5:02 PM Jun 16th

bjames
The irony of Dan's post is astounding! I mean, three THOUSAND biographies and not a single one on a guy on the same plane as Dave Stieb, who is one of the 10 or 15 pitchers I think of as representing my youth.

Not to mention that he was a member of the United States Congress for several years, and was one of the leaders in the Union effort that brought Marvin Miller into the game.
4:17 PM Jun 16th

tangotiger
Finally, if you do end up adopting the +2 per out, you can adjust that +2 to whatever it is for the park and league factors, so that you always get the "50" you want. Maybe for Hershiser in 1988, it'll be +1.8, and maybe at Coors it's +2.4 or something. It makes life kind of easy to just adjust the one parameter like that.
3:45 PM Jun 16th

tangotiger
The irony of Dan's post is astounding! I mean, three THOUSAND biographies and not a single one on a guy on the same plane as Dave Stieb, who is one of the 10 or 15 pitchers I think of as representing my youth?

I couldn't have made a better case as to how messed up we are in remembering great players.
3:21 PM Jun 16th

tangotiger
Bill: not that it matters a whole lot, but since you put in the check anyway, did you notice that your Game Score system can now be rewritten as:

starting everyone at 35 points
+6 points for each of the first 4 innings
+8 points for the 5th inning
+5 points for the 6th and later innings

In short, you now have a SMALLER benefit for the later innings, and the largest is the 5th inning.

It made sense back when you had the Game Score system starting at 50 points to give the bonus for going deep in the game. But now, you've gone the other way here in giving so much bonus at being "above replacement".

In short, you could probably simply turn the whole thing into +2 points per out recorded, and you'll be in the same spot without the complications of the penalties and bonuses.

3:19 PM Jun 16th

danfeinstein
We should be moving past the binary slotting of players, in/out, yes/no. There are layers to their accomplishents that should let us remember guys like Stieb and Rogers (or Chris Carpenter of the current generation), and not let their fates end up as it has (for me) with Larry Jackson.

Here is a plug for SABR's biography project which has compiled nearly 3,000 peer reviewed biographys of players, executives, coaches, scouts, organists, announcers, and others involved in the game. http://sabr.org/bioproject (Unfortunately, among the ~3000, there isn't one written yet for Larry Jackson; perhaps one of the readers here would like to start.)
3:08 PM Jun 16th

MarisFan61
Bill: When you did a study a few years ago on "Consistency" of pitchers, I said (maybe not on here, maybe just on Reader Posts, but I think in both places) that you were missing the point of what people mean when they talk about pitchers being "consistent" or (more often) complain about their being "inconsistent" -- which it seemed was what you were wanting to look at -- by evaluating their 'consistency' in the precise metric way that you did. I would say that what they mean, most of the time, is related to this thing that you found here: "Dominant pitchers almost never actually have bad games." (emphasis added) What they're talking about is how regularly the pitcher has "good games" as opposed to "bad" ones. In the earlier study, you found things like that Greg Maddux wasn't highly consistent. I thought that missed the point that you were trying to evaluate. This thing you're saying there in this article about "having bad games" is much closer to what people mean, and actually I think it's exactly what they're talking about.

1:03 PM Jun 16th

bjames
We should be moving past the binary slotting of players, in/out, yes/no. There are layers to their accomplishents that should let us remember guys like Stieb and Rogers (or Chris Carpenter of the current generation), and not let their fates end up as it has (for me) with Larry Jackson.

I agree.
12:15 PM Jun 16th

mrbryan
Jackson was elected to the Idaho legislature after he retired. He is also the answer to the question - "Who did the Cubs give up to get Fergie Jenkins?"
10:33 AM Jun 16th

tangotiger
When I first saw "Larry" Jackson, I was thinking "Danny" Jackson, which made no sense to put him in the company of the other guys.

Now that I see who Bill is talking about, Larry ended his career a decade before I became a fan. And I've never heard of him before today.

That I think is the fate that awaits Stieb, Rogers, Appier and other pitchers in this class. I think the way some tiny number of players are elevated (like Catfish Hunter), while other players of similar accomplishments are left behind is a black mark in how we preserve history.

We should be moving past the binary slotting of players, in/out, yes/no. There are layers to their accomplishents that should let us remember guys like Stieb and Rogers (or Chris Carpenter of the current generation), and not let their fates end up as it has (for me) with Larry Jackson.

9:44 AM Jun 16th

CharlesSaeger
Ungodly minor quibble: a game where a starter strikes out one batter and then comes out scores as 52, not 51. He gets a point for the out and another point for it being a strikeout.
9:25 AM Jun 16th

Zeth
And every last one of those guys was better than Jack Morris.
8:37 AM Jun 16th

tangotiger
In my versions of Game Score, I make the starting point "40" for the very reason that Bill is talking about in the end. If you play around with it, you will likely conclude that the starting point should be anywhere from 30 to 40. So, I'm glad that Bill has made the adjustment.
7:51 AM Jun 16th

The 10 Levels Study I

COMMENTS (31 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: