Season Scores

By Bill James

August 12, 2007

Before we get started here, let me ask you a few random questions, and I’ll ask you to pull out an Encyclopedia or whatever source you use to evaluate these things. What was the best season of Stan Musial’s career? What was the best season of Jerry Remy’s career, or Phil Rizzuto’s, or Roberto Alomar’s, or Kevin Appier’s, or Tony Armas’, or Rich Aurilia?

What were the three best seasons by hitters in the 1920s? The 1950s? The 1970s?
What were the three best seasons by pitchers in the 1930s? The 1960s? The 1990s?

What is the best season by a hitter in the history of the Detroit Tigers? What was the best season by a Tiger in the 1950s? The 1960s? What was the best season by a Cincinnati Reds’ player in the 1980s?

What was Dennis Eckersley’s best season, or Hoyt Wilhelm’s, or John Smoltz’s? What was the best season by a Boston Red Sox pitcher, starter or reliever, in the 1960s? By a St. Louis Cardinals’ pitcher in the 1990s? Just pick the season you like best, on whatever basis you choose. Write down your answers, if you want to, because there will be a test later on—not a test for you, but a test for me. We’ll assume that you are right, and then we’ll check my answers to see how I did.

OK, I have this system for Season Scores. It’s hard to explain the need for this system, probably because there isn’t any real need for this system; the world is already overrun with unmemorable statistical rating system, and this is just another one.

Well, it’s useful to me, OK? I like it. This came about, sometime in the late 1990s, because I needed a system to quickly and easily identify the best season by a pitcher within a group of seasons. There’s no easy reference here. The sort of “technical” evaluation is, perhaps, Runs Saved Against Average, or Wins Above Average, or Win Shares. …something like that. But let’s take these two seasons:

Pitcher, Season	G	IP	W- L	Pct.	SO	BB	ERA
Joe Coleman, 1971	40	280	20- 9	.690	236	96	3.15
Brad Radke, 1996	35	232	11-16	.407	148	57	4.46

If you adjust for everything you are supposed to adjust for, Radke’s season is actually better than Coleman’s. The American League ERA was 3.46 in 1971, 4.99 in 1996, so Radke was actually further below the league ERA than Coleman was, either in raw totals or as a percentage. Tiger Stadium in ’71 had a Park Run Factor of .91, whereas Minnesota in ’96 was 1.08, so if you adjust for that, Radke is actually much better. Coleman was about 5 runs better than an average pitcher in his context; Radke was about 19 runs better.

That statistics do not always mean what they seem to mean at a glance is a core tenet of sabermetrics, so I’m certainly not denying that. On the other hand, you have to admit: 20-9 with a 3.15 ERA and 236 strikeouts is a hell of a lot better than 11-16 with a 4.46 ERA and 148 strikeouts. Which guy do you want on your fantasy team next year: the guy who goes 20-9 with 236 strikeouts and a 3.15 ERA, or the guy who goes 11-16 with a 4.46 ERA and 148 strikeouts?

There has to be some way to “score” seasons based not exactly on how they are but on how they are independent of context. Not everybody chooses to adjust for everything, you know. If you adjust for everything you can adjust for, Boog Powell was no doubt a better hitter than Jim Bottomley. Nonetheless, Bottomley is in the Hall of Fame, and Powell isn’t. Amos Otis was probably a better player than Sam Rice. That’s life, Amos; don’t be expecting to get The Call.

Very often, when you’re doing research, you want to quickly and reasonably identify a player’s best season, or identify the best hitter on a team, or identify the best pitcher on a team. You know who it is, just by glancing at the stats. . ..any idiot could look at the stats and tell you who it is, 19 times in 20. You don’t have to be a rocket scientist to figure out that 2001 was Rich Aurilia’s best season.

I needed a quick-and-dirty method to identify a pitcher’s best season without adjusting for the influence of the moon. Once I had that system, I found that I was using it every day. I started to want one for batters as well. It’s kind of an “any idiot would choose” system. Look at Delino DeShields, 1992 and 1993:

Year	G	AB	R	H	2B	3B	HR	RBI	SB	Avg.
1992	135	520	82	155	19	8	7	56	46	.292
1993	123	481	75	142	17	7	2	29	43	.295

Which is the better season? If you figure runs created, because DeShields had more walks in less playing time in ’93, the two seasons are even, 80 runs created each season. By Runs Created Against Average, by Runs Created per 27 outs, OPS, anything like that, DeShields’ ’93 season is actually better than ’92.

Nonetheless, anybody is going to look at those stats and say “Well, his average was a little higher in ’93 than it was in ’92, but only 3 points. He had more runs scored in ’92, almost twice as many RBI, more doubles, triples, home runs, stolen bases. . .obviously ’92 is the better season.”

I’ve made a tactical mistake here (he says, blundering ahead with the essay.) I have introduced Season Scores to you as a system that gets the answer wrong. That’s not what it is at all. It took me a half-hour to find these examples. If you just pick two seasons at random and ask which one is the better season, Season Scores is going to pick the same one as a sophisticated analyst will certainly more than 95% of the time, and probably more than 99% of the time. Even when it is wrong, there is some sort of reasonable basis for it, in the sense that it is reasonable to say that 20-9 with a 3.15 ERA is better than 11-16 with a 4.46 ERA. It’s not that hard. It’s just a simple “scale” to string batters along between Bill Bergen and Babe Ruth.

Here’s the formula. I’ll explain the pitcher’s system first because:

I developed it first, and
The batter’s system is a little bit more complicated.

The pitcher’s season score is the sum of three parts, which are:

Part I—Decisions

10 times wins, plus 3 times saves, minus 5 times losses Part II—Earned Runs

Earned Runs Saved as compared to a pitcher pitching the same number of innings with an ERA of 5.00 Part III—Strikeouts and Walks

2 times Strikeouts, Minus 3 times Walks, the total divided by 3. That’s all. Any of the three parts CAN yield a negative score, so there are pitchers whose “season scores” are negative. You get a negative score if you have a winning percentage under .333, a negative score if you have an ERA over 5.00, and a negative score if your strikeout/walk ratio is worse than 1.50 to 1. My intention was that the zero point should more or less coincide historically with the replacement level, as best one can guage that over the long sweep of history.

In the history of baseball through 2006 there have been 34,689 pitcher/seasons, of which 6,722 had negative scores on Part I, 10,571 had negative scores on Part II, and 20,323 have negative scores on Part III. 3,146 pitchers have negative scores in all three areas—but 9,515 have positive score in all three areas.

On Part I, the highest score in history is by Hoss Radbourn in 1884, when he was 59-12 or 60-12, depending on the source, and the lowest score is by John Coleman in 1883, when he was 12-48.

On Part II the highest score in history is, again, by Old Hoss in ’84, and the lowest score ever is by Les Sweetland in 1930 (7.71 ERA in 167 innings.)

On Part III the highest score ever was by One Arm Daily in 1884 (Radbourn is 4^th), and the lowest ever was by Ed Crane in 1890 (117 strikeouts, 210 walks.)

The highest-scoring season of all time was, of course, Radbourn’s. It scores at 1002. Going 59-12 with a 1.38 ERA. . .that’s a tough season to beat. The worst pitcher’s season of all time was by Frank Bates in 1899, mostly with the Cleveland Spiders. Bates finished 1-18, struck out 13 batters, walked 110, and posted an ERA of 6.90. Negative 216.

Nineteenth century pitching lines—and for that matter, early twentieth century pitching lines—are, of course, entirely unrelated to modern baseball. The highest pitcher’s score since 1900 was by Ed Walsh in 1908 (40-15, 1.42 ERA, scores at 651).

The highest pitcher’s score since Walsh was by Walter Johnson in 1913 (36-7, 1.14 ERA). That scores at 603.

The highest score since then was by Pete Alexander in 1915 (33-12, 1.22 ERA). That scores at 524.

The highest score since Alex was by Sandy Koufax in 1965 (26-8, 2.04 ERA, also 382 strikeouts). That scores at 520.

The highest score since Koufax was by Denny McLain in 1968 (31-6, 1.96 ERA.) That scores at 517.

The highest score since they raised the mound in ’69 was by Steve Carlton in ’72 (27-10, 1.97 ERA, 310 strikeouts.) That scores at 456.

The highest score since Carlton was by Pedro Martinez in 1999 (23-4, 2.07 ERA, 313 strikeouts against 37 walks.) That scores at 451.

The highest score since Pedro was by Randy Johnson in 2002 (24-5, 2.32 ERA, 334 strikeouts.) Johnson’s teammate Curt Schilling also scored at 424 the same year—an absolutely phenomenal tandem. If you add their numbers together, they don’t come real close to matching Old Hoss Radbourn by himself.

The highest score since 2002 was by Johan Santana in 2004 (20-6, 2.61 ERA, 265 strikeouts.)

You must be thinking that the Season Score for a pitcher tends to predict the Cy Young Award. Well, yes, of course, it does, but it isn’t really designed to do that, and I don’t want to dwell on that, at least right here. I have made up numerous systems to predict Cy Young voting. When I do that I try to accommodate the little twists and turns in the voting, trying to figure some way to get Vern Law in 1960 (20-9, 3.08 ERA) ahead of Ernie Broglio (21-9, 2.75 ERA). Here, I didn’t specifically consider the Cy Young voters (although incidentally, because of the strikeout/walk adjustment, Law does score ahead of Broglio--but behind Lindy McDaniel and behind Bob Friend.) The Season Score does predict the Cy Young vote 60, 70% of the time, but I just didn’t really worry about that.

In the history of baseball there have been 5,487 pitchers who pitched 200 innings in a season. The average score for those pitchers is 152. The average score for a pitcher pitching 100-200 innings is 55 points.

The batter’s season score is the sum of four components:

Part I—Runs Scored and RBI

Runs Scored Plus RBI, minus outs made divided by 7. Part II—Hits

Part III—OPS and Batting Average

Part IV—Extra Base Hits and Stolen Bases

Extra Base Hits plus Stolen Bases, minus 2 times Caught Stealing On Part I, the highest-scoring season of all time was by Babe Ruth, 1921 (298), and the lowest (excluding pitchers, or maybe including them, I don’t know) was by Bill Bergen in 1906 (-16). About 42% of the points for hitters are awarded under Part I.

On Part II (Hits), the highest-scoring season of all time was by Hugh Duffy, 1894 (74) and the lowest was by Bill Bergen in 1909 (-21). A hitter gets a negative score on Part II if he hits for an average lower than .222, or if he has extra outs (GIDP, CS, Sacrifices) that would drive his batting average below .222.

On Part III (OPS and batting average), the highest-scoring season of all time was by Babe Ruth, 1921 (257) and the lowest was by Bill Bergen, 1909 (-50).

On Part IV (Stolen Bases and Extra Base Hits), the highest-scoring season of all time was by Arlie Latham, 1887 (176). However, Latham had 129 stolen bases that season, and there is no record of his caught stealing, which obviously biases the stat to his advantage. The highest-scoring season for which there is caught stealing data was by Carlos Beltran in 2004. Beltran had 83 extra base hits (36-9-38) and 42 stolen bases, while being caught stealing only 3 times—a score of 119. The lowest-scoring season of all time was by Zip Collins in 1916—eight extra base hits, four stolen bases, 21 caught stealing for a score of –30.

Overall, the highest-scoring season by a non-pitcher was by Ruth, 1921—717. Ruth ranks far behind Old Hoss Radbourne, but ahead of any 20th-century or 21st-century pitcher. (Those of you who are extra-alert will pick up on the fact that there is a problem with describing Ruth in 1921 as a non-pitcher, since he did pitch in two games that season, going 2-0. His score as a pitcher in 1921 was 8 points, so you can, if you want, score his season at 725.)

The highest-scoring season since 1921 was by Lou Gehrig, 1927—47 homers, 175 RBI, .373 average. That scores at 667.

The highest-scoring season since 1927 was by Chuck Klein, 1930—59 doubles, 40 homers, 170 RBI, .386 average. That scores at 661.

The highest-scoring season since 1930 was by Jimmie Foxx, 1932--.364, 58 homers, 169 RBI. That scores at 628.

The highest-scoring season since 1932 was by Gehrig, again, in 1936--.354 with 49 homers, 152 RBI, 130 walks. That scores at 612.

The highest-scoring season since 1936 was by Todd Helton in 2000--.372 with 59 doubles, 42 homers, 147 RBI. That scores at 594.

The highest-scoring season since 2000 was by Barry Bonds in 2001--.328 with 73 homers, 137 RBI. That scores at 593.

It may be surprising that Helton scores even one point ahead of Bonds, since, by most sabermetric models, Bonds’ 2001 (and 2002) seasons are the most impressive seasons since Ruth, or the most impressive ever. I am not saying that the usual analysis is wrong. I am just saying. . .compare Helton to Bonds. Helton hit for an average 40 points higher, drove in more runs, scored more runs. Without park adjustments, it is not entirely unreasonable to say that Helton’s season is just as good. Anyway, the best season since 2001 was by Bonds in 2002, then by Bonds in 2004.

The worst hitter’s seasons ever were by Bill Bergen in 1911 (-53), 1906 (-64) and 1908 (-78).

Since I have not introduced this system to the public until now I have sort of been “writing around” it, avoiding writing about things that rely on these season scores. It’s just a rough model and it no doubt has its flaws, but I use it every day, and I think it will be a relief to me to have the system on record so that I can refer to it.

OK, now let me take my tests. My argument is that my system, for all of its flaws, will get the same answers that you would give to most of the ordinary type of questions that we ask the system to deal with. . .what’s a hitters’ best year, who was a team’s best pitcher, etc. My little system’s answers are not always right, but they are always fairly reasonable. Here are my answers:

Stan Musial’s best season was 1948—586 points. His second-best year was 1949, his third-best 1946.

Jerry Remy’s best year was 1978 (163)—but it’s almost too close to call. 1982 scores at 157 points. . ..either of those is really almost the same.

Phil Rizzuto’s best year, obviously, was 1950 (332 points), and Roberto Alomar’s best season was 1999 (459 points).

Kevin Appier’s best year was 1993 (18-8, 2.56 ERA, 248 points). Tony Armas the pitcher’s best season was 2002 (12-12, 4.44 ERA, 80 points), and Tony Armas Senior’s best year was 1984 (.268 with 43 homers, 123 RBI, career high in both runs scored and RBI. Score 330. )

Rich Aurilia’s best year, obviously, was 2001—37 homers, 97 RBI, 206 hits, .324 average, 394 points.

The three best seasons of the 1920s were by Babe Ruth, 1921, Lou Gehrig, 1927, and Rogers Hornsby, 1922:

Player, Year	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	CS	Avg.
Ruth, 1921	152	540	177	204	44	16	59	171	145	81	17	13	.378
Gehrig, 1927	155	584	149	218	52	18	47	175	109	84	10	8	.373
Hornsby, 1922	154	623	141	250	46	14	42	152	65	50	17	12	.401

The three best hitter’s seasons of the 1950s are 1) Mickey Mantle, 1956, 2) Henry Aaron, 1959, and 3) Duke Snider, 1953. Two of the three did not win MVP awards, because the voters in that era liked to give the award to catchers and shortstops.

For the 1970s, my three are 1) Rod Carew, 1977 (.388 with 100 RBI) 2) George Foster, same year (52 homers, 149 RBI) and Billy Williams, 1970 (.333 with 42 homers, 129 RBI).

For best pitcher’s season of the 1930s, my answers are 1) Lefty Grove, 1931 (31 wins, 4 losses, 2.06 ERA), 2) Lefty Grove, 1930 (28-5), and 3) Dizzy Dean, 1934 (30-7). Best pitcher’s seasons of the 1960s: 1. Koufax, 1965, 2. McLain, 1968, 3. Koufax, 1963, not that Bob Gibson in ’68 is a bad answer.

Best pitcher’s seasons of the 1990s. . .1) Pedro Martinez, 1999, 2. Roger Clemens, 1997, 3. John Smoltz, 1996.

Best season ever by a Tiger hitter was Ty Cobb, 1911 (fifth-greatest hitter’s season of all time.) Best season by a Tiger in the 50s was Al Kaline, 1955, and for the 60s, of course, Norm Cash in ’61. The best season by a Reds’ player in the 80s was Eric Davis in 1987 (37 homers, 50 stolen bases).

The best season of Eckersley’s career, starter or reliever, was his MVP year, 1992, and the best season of Wilhelm’s career was 1964, although it’s basically a dead heat between ’64 and his rookie season, 1952. Smoltz’ best season by far was 1996, although the relief years are certainly impressive.

The best season by a Red Sox pitcher in the 1960s actually comes in as a tie (I didn’t check any of the answers as I was writing the questions). Both Radatz in ’64 and Jim Lonborg in ’67 score at 312. Actually, it’s Radatz at 311.9 and Lonborg at 311.7, but I don’t count the decimals because this isn’t intended to be a precise, argument-settling analysis, but rather a simple system to recognize the obvious. It’s obvious that both Radatz and Lonborg are really good, and it’s not obvious which one is better.

The best Cardinals’ pitcher of the 1990s was. …wait a minute, there must be somebody here. Actually it’s Tewksbury, 1992 (16-5, 2.16 ERA). Just ahead of Lee Smith, 1991.

How’d I do? There is no compelling logic to my system; there is no underlying genius to it. It’s just a way to systematically choose the obvious answer when I am doing a study that requires me to choose somebody to represent a group, and I don’t want to risk biasing the data by subjectively selecting one.

Bill James
Brookline, Massachusetts

COMMENTS (2 Comments, most recent shown first)

yorobert
under "part II-hits", you say:

"hits, divided by 2, minus outs made divided by 7. of course, we could simplify the formula by simply dividing outs made by 3.5 and subtracting it once, rather than dividing it by 7 and subtracting it twice."

there is an error in these two sentences; if one sentence is correct, then the other must necessarily be wrong.

great article.

12:25 AM Jun 22nd

tangotiger
I don't see how you are not biasing the data. If you are researching the effects of RBIs, then Part 1 will provide a huge source of bias. You really have to look at what you are studying on a case-by-case basis, and determine the selective sampling and bias issues.
10:37 AM Mar 7th

Season Scores

COMMENTS (2 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: