Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Teams on Paper 1

By Bill James

March 16, 2009

I. The Issue

What is the most under-achieving team of all time? What is the most over-achieving team? Who should have won the National League in 1969, the year of the Miracle Mets? I mean, we know that the Mets won it, and we know that the Mets weren’t really all that good, but. . ..who should have won it? Who had the best team?

What is the most talented team of all time? The most talented team of the 1950s? The 1960s? What is the most talented team in the history of the Dodgers, or the history of the Padres, or the history of the Red Sox?

Let’s take the 1963 Tigers. The 1963 Tigers finished under .500, at 79-83. At the same time, if you look at that roster. . .that’s a heck of a team. They had two catchers, Gus Triandos and Bill Freehan. Triandos had a fine career, being a regular catcher for eight seasons, a feared power hitter and respected for his ability to work with young pitchers. In 1957 he threw out 42 of 63 base stealers, which I believe is the highest throw-out percentage on record for a regular catcher. Freehan was quite a bit better than Triandos; he was an eight-time All-Star.

In the outfield they had Rocky Colavito, a near-Hall of Famer, Al Kaline, an actual Hall of Famer, and Bill Bruton, who was a fine player. At first base they had Norm Cash, who had a long career and hit 377 career home runs. At shortstop they had Dick McAuliffe, an extremely good player.

The starting rotation featured Jim Bunning—a Hall of Famer—and Mickey Lolich, a near- Hall of Famer. Backing them up were Hank Aguirre, who had led the league in ERA in ’62, and Phil Regan, who was a solid pitcher with a number of career highlights. Few or none of these key players were old or too young; they were almost all in their prime.

That’s a lot of talent, right? Yes, they finished 79-83, but. .. .it’s still a hell of team on paper.

That’s what this article is about: looking at teams on paper, so to speak. It’s just an expression. . .how teams look on paper depends entirely on what you put on paper. . .but you know what we mean by that. On paper, the 2006 Cleveland Indians had a decent bullpen.

How teams look on paper is something that we normally throw away as soon as the games are played. It doesn’t matter once the season is over how good you should have been; it only matters whether you actually performed.

What I am saying in this article. . .this series of five articles, actually. .. is “suppose that we don’t throw that away.” Suppose that we stick with that a little while, develop a method not to evaluate not how good teams actually were, but how good they should have been.

Why?

Because there is a whole list of questions that can be accessed through that approach that are difficult to answer through the traditional approach of looking at teams by after-the-fact statistics.

I outlined some of those questions at the top of the article, and those could be teased out into a long list of similar questions. As we can ask the question “Who should have won the National League in 1969?” we can ask about 1968, or 1970, or any other year, or any other league. As we can ask about the most under-achieving team of all time, we can ask about the most under-achieving team of the 1970s, or any other decade.

But this approach also gives us access to another range of questions. We could re-evaluate the career of Gene Mauch, for example, by looking not merely at the performance of his teams, but at the performance of those teams relative to the talent—Gene Mauch, or Alvin Dark, or Tommy Lasorda, or whoever it is you want to re-evaluate.

Why do some teams under-achieve, and others over-achieve? Are there characteristics of under-achieving and over-achieving teams?

I actually didn’t get into all of those questions, even by the end of this five-article set, so I shouldn’t set you up to think that all of those answers to those questions will be forthcoming. They won’t. But I’ve been working on this for two months, and I have learned quite a number of things that I think are very interesting. I have come to an understanding about a certain number of issues that have puzzled me for many years.

And we gain a certain insight into the comparison of great teams. After-the-fact analysis is locked at .500 for the league. Somebody has to win; somebody has to lose. It creates the illusion that one league is the same as the next—even when they are not. There are certain teams that I thought were great teams that I now see, looking at it through this lens, are really not that impressive. There are other teams that might be added in their place to the roster of the greatest ever.

This is a historical article. The approach that I outline is of no use at all in comparing teams in 2009, and is of limited use for teams since 1990. I’ll outline the method in the next article.

This article runs more than 50 pages. We’ll run it as a five-part series:

I. Monday Introduction and Explanation of the Method

II. Tuesday The Strongest Teams Ever

III. Wednesday The Over-Achievers

IV. Thursday The Under-Achievers

V. Friday Other Teams, Other Notes

II. The Teams-on-Paper Method

We have to accept, as a starting point, that what we’re trying to do here is impossible. There is no way of saying precisely how good the 1985 Atlanta Braves should have been. We know they expected to be good going into the season; we know they lost 96 games. We have no way of knowing how good they should have been.

However, while we cannot answer these questions with mathematical certainty, what we can get is answers that are systematic and reasonable. Henry Aaron was the right fielder for the 1957 Milwaukee Braves, Reggie Jackson was the right fielder for the 1968 Oakland A’s, Tom Brunansky was the right fielder for the 1987 Minnesota Twins, Ollie Brown was the right fielder for the 1966 San Francisco Giants, Ted Gullic was the right fielder for the 1930 St. Louis Browns, and they were all about the same age. We know that Hank Aaron was a greater player than Reggie Jackson, we know that Reggie was a better player than Tom Brunansky, we know that Brunansky was a better player than Ollie Brown, and we know that Ollie Brown was a better player than Ted Gullic. These are not really debatable conclusions.

What we need, then, is a system that places a value on “Henry Aaron, 1957” that is greater than the value placed on “Reggie Jackson, 1968”, a greater value on “Reggie Jackson, 1968” than on “Tom Brunansky, 1987”, etc. (OK, Tom Brunansky was a couple of years older; I couldn’t find a 22- or 23-year-old right fielder who fit.)

We can’t use the players’ season statistics, because that leads us back to the question of how good the teams actually were. We know that Henry Rodriguez had a great year for Montreal in 1996, but the fact is, he was Henry Rodriguez. He wasn’t a great player. He wasn’t even a really good player.

We can’t use the player’s season statistics to derive our values, or we’ll circle back to the conclusion that everybody was as good as they should have been. In 1968 Harmon Killebrew hit .210 with 17 homers, 40 RBI; Ken Harrelson hit .275 with 35 homers and 109 RBI. Nonetheless, Harmon Killebrew was Harmon Killebrew and Ken Harrelson was Ken Harrelson. We have (essentially) the same relative values for them in 1968 as we have in 1967 or 1969, when Killebrew was very much the better player.

The basis of our system is two things

1) age,

2) career games played (or career innings for pitchers.)

I’ll explain first a few simple examples in which we don’t get into anything else. . . .let’s do Tom Bruanansky, 1987, and Ollie Brown, 1966. We divide the player’s career games played by 100. Brunansky played 1800 games in his career, so that makes 18.00. Ollie Brown played 1,221 games in his career, so that makes 12.21.

This we modify by age, in the following way:

(Age – 18) * (38 – Age) + 100

------------------------------------

190

At age 20 this formula has a value of .716; at age 24, a value of .968; at age 28, a value of 1.053; at age 32, a value of .968; at age 36, a value of .716; and at age 40, a value of .295. Thus, this part of the formula increases a player’s expected value when he is in his prime, decreases it slightly when he is outside his prime, and decreases it meaningfully when the player is a long way out of his prime.

Tom Brunansky was 26 years old in 1987, and the adjustment for age 26 is 1.032, so Brunansky’s value in 1987 is (18 * 1.032) = 18.56. We convert this to an integer, and Brunansky’s “should be” or “expected” value in 1987 is 19.

Ollie Brown was 22 years old in 1966, and the “age adjustment” for a 22-year-old is .863. (Actually it isn’t; I’ll make another adjustment later. ..but for now, let’s assume it is .863.) That makes, for Ollie Brown in 1966, 12.21 * .863, which is 10.54, which will round up to 11—or would, were it not for the later adjustment.

There is a lot more of the system to be explained, but let’s deal first with the “long career = good player” question. I know that there will be people who will object to the system being fundamentally based on how many major league games the player plays, arguing that playing for a long time doesn’t prove that you’re good, and having a short career doesn’t prove that you’re not good.

Well, yeah, that’s true on a certain level. The fact is that career length is, in general, an extremely reliable indicator of a player’s ability. There are cases, yes, where players who aren’t great play for a long time, and there are cases where players who are good have very short careers. There are such cases, yes, and there are damned few of either one. We can deal with the exceptional cases, to an extent, by some of the things that we’ll do later on. In general, a player’s career games played are an extremely good indicator of how good a player he was, and we’re trying to do something which is impossible—say how good a player should have been in a given season—without using his statistics for the season. I’m not saying it is perfect. It’s easy, and it’s pretty good most of the time. We’re going to use that as the basis for our system.

The next thing we need to deal with is Innings Pitched. The pitcher, as well as the hitter, gets credit for each 100 games played, and, in calculating that, we give the pitcher credit for his career games as a hitter or a pitcher, so that those guys who get at bats as a pinch hitter or get into games as a pinch runner get credit for that.

Beyond games, a player gets 1.00 additional point for each 250 innings pitched. Babe Ruth played in 2,503 games, which is 25.03 basis points. Ruth also pitched 1,220 innings for another 4.88 basis points, giving him 29.91 basis points for each season’s calculation.

There are a few cases in baseball history in which this combining of pitching and hitting basis points creates misleading expectations. The third baseman for the 1934 Philadelphia Phillies was Bucky Walters, who later had a long and distinguished career on the mound. Our system thinks that Walters (the third baseman) should be a really good player, and puts a premium value on him (27 points). That’s obviously wrong.

But that’s also the only case I have found so far—entering data for about 5,000 player/seasons into my spreadsheet—that’s the only one that’s popped up so far in which this practice causes a real problem. You have to cover Babe Ruth somehow; his career is “short” because he spent several years as a pitcher. We have to let him get credit for that, or we have misleading numbers for Babe Ruth.

The larger issue is that the generalization that a long career equals a quality player is not as reliable for pitchers as it is for hitters. There are more pitchers who have career lengths not indicative of the quality of their performance than there are hitters. Also, the generalization that players have their best years at ages 25-30 is not as reliable for pitchers as it is for hitters. We can do some things to deal with that later on, but—again, it’s not a perfect system, and can’t be. It could be better than it is.

The next thing I added to the system was a “Hall of Fame bonus.” We can make the Hall of Famers stand out from the crowd better by giving the Hall of Famers 10 extra basis points—as if they had played another 1,000 games in their careers. Getting back to our “right field group”. . .Reggie Jackson and Henry Aaron are Hall of Famers. Thus, comparing Aaron, Reggie, Brunansky, Ollie Brown and Ted Gullic in their relevant seasons:

Aaron, 1957 32.98 (Games) + 10 (Hall) = 42.98 * .921 (age adjustment) = 39.53

Reggie, 1968 28.20 (Games) + 10 (Hall) = 38.20 * .863 (age adjustment) = 32.97

Brunansky, 1987 18.00 (Games) * 1.032 (age adjustment) = 18.57

Ollie, 1966 12.21 (Games) * .863 (age adjustment) = 10.54

Gullic, 1930 1.96 (Games) * .921 (age adjustment) = 1.81

We have Aaron at 40 points, Reggie at 33, Brunansky at 19, Ollie Brown at 11, and Ted Gullic at 2. These are almost the final values for these players, not quite.

Switching now to the pitchers. . . .remember, it’s essentially the same system for pitchers that it is for position players. I had two problems with the pitchers: one, that my values were running a little bit low, and two, that the generalization that a long career equals a good player is less reliable for pitchers than it is for position players.

I decided to add basis points, for pitchers, based on Career Wins Minus Losses. For each eight games that a pitcher was over .500, in his career, I added one basis point.

These were added as integers, not fractions (that is, 23 Wins – Losses is 2 basis points, not 2.875), and the basis points added by this rule are limited to 10. Also, this is an alternative to the Hall of Fame, not in addition to it. That is, players who are in the Hall of Fame receive no additional points for Career Wins Minus Losses.

What this means, in practice, is that for really good pitchers, it makes no difference whether they are in the Hall of Fame or not. Carl Mays has been left out of the Hall of Fame because he was an obnoxious person who killed a batter with a pitch and wasn’t particularly remorseful about it; nonetheless, he was 81 games over .500 in his career (207-126), so he gets ten points for that. If he was in the Hall of Fame he would get the 10 points for being in the Hall of Fame, but he would lose the 10 points for his won-lost record, so he would come out the same. Randy Johnson, Roger Clemens and Mike Mussina come out the same now as they will when they are elected to the Hall of Fame. To a few pitchers, it makes a little bit of difference. Ron Guidry, for example, was 170-91 in his career—79 games over .500—so he gets nine points for that. Wes Ferrell gets 9 points, Urban Shocker and Vic Raschi get 8, Luis Tiant and Jim Kaat get 7, Orel Hershiser gets 6, Gary Nolan gets 5, Dave Stewart and Jack Billingham get 4, Frank Viola and Juan Pizarro get 3.

These are added to the basis points BEFORE the age modifications are put in place. It’s a useful adjustment that helps the system work better, and I wish there was something comparable (and equally easy) I could do for hitters, but there just isn’t.

Then I got to worrying about good position players who are screwed in the system because the Hall of Fame didn’t like them, and I realized that one thing I could do would be to add 3 points for an MVP season. Roger Maris, for example, has a fairly short career and is not in the Hall of Fame, but one thing we can do to recognize that he was a better player than the other guys who played 1400-1500 game is to give him 3 points for each of his two MVP seasons, 6 basis points total.

Maris gets these basis points not only in 1960 and 1961, when he won the Award, but throughout his career. The fact that he won MVP Awards is part of who he is. If you say that “their first baseman was Don Mattingly”. ..well, there’s an MVP Award there that is a part of who Don Mattingly is, regardless of whether we’re talking about his MVP season or not.

But the total basis points which are awarded for MVPs, the Hall of Fame and Wins over .500 cannot exceed 10. In other words, Hal Newhouser doesn’t get 10 points for being in the Hall of Fame, 6 points for his two MVP Awards and 7 points for being 57 games over .500. We treat those as one category, maximum 10 basis points. Newhouser—like all other Hall of Famers—maxes out the category.

OK, let’s compare how a few pitchers rank; I’ll use 32-year-old pitchers so that age isn’t an issue. Let’s take Warren Spahn, 1953, Jim Palmer, 1978, Dave Stewart, 1989, John Montefusco, 1982, and Paul Abbott, 2000. We start with these pitchers career games and innings pitched:

  Games Innings

Warren Spahn 782 5246

Jim Palmer 558 3948

Dave Stewart 523 2630

John Montefusco 298 1651

Paul Abbott 162 721

These are the basis points that these pitchers get for their career games and innings pitched:

        Total

Warren Spahn	782	5246	7.82	20.984	28.804
Jim Palmer	558	3948	5.58	15.792	21.372
Dave Stewart	523	2630	5.23	10.52	15.750
John Montefusco	298	1651	2.98	6.604	9.584
Paul Abbott	162	721	1.62	2.884	4.504

Now we add in the ten points to Spahn and Palmer for being Hall of Famers:

	G	IP	P1	P2	P3	Total
Warren Spahn	782	5246	7.82	20.984	10	38.804
Jim Palmer	558	3948	5.58	15.792	10	31.372
Dave Stewart	523	2630	5.23	10.52		15.750
John Montefusco	298	1651	2.98	6.604		9.584
Paul Abbott	162	721	1.62	2.884		4.504

Dave Stewart was 39 games over .500 in his career, so he gets four points for that.


	G	IP	P1	P2	P3	Total
Warren Spahn	782	5246	7.82	20.984	10	38.804
Jim Palmer	558	3948	5.58	15.792	10	31.372
Dave Stewart	523	2630	5.23	10.52	4	19.750
John Montefusco	298	1651	2.98	6.604		9.584
Paul Abbott	162	721	1.62	2.884		4.504

Montefusco and Abbott were over .500, but less than eight games over .500, so they don’t move up. All of these pitchers were 32 years old in the seasons we are comparing and the adjustment for a 32-year-old is .968 421, so we’ll multiply all of those totals by .968 421:

		Age
	Total	Adjustment	Value
Warren Spahn	38.804	.968421	37.579
Jim Palmer	31.372	.968421	30.381
Dave Stewart	19.75	.968421	19.126
John Montefusco	9.584	.968421	9.281
Paul Abbott	4.504	.968421	4.362

And then we’ll convert those to integers:

		Age		Integer
	Total	Adjustment	Value	Value
Warren Spahn	38.804	.968421	37.579	38
Jim Palmer	31.372	.968421	30.381	30
Dave Stewart	19.75	.968421	19.126	19
John Montefusco	9.584	.968421	9.281	9
Paul Abbott	4.504	.968421	4.362	4

Giving us values of 38 for Spahn, 30 for Palmer, 19 for Stewart, 9 for Montefusco and 4 for Paul Abbott. This is the end of the process for these players.

Those numbers represent how we would expect these players to perform, just based on their age and who they were. In fact, Spahn went 23-7 with a 2.10 ERA, Palmer went 21-12 with a 2.46 ERA, Stewart went 21-9 with a 3.32 ERA, Montefusco was 10-11 with a 4.00 ERA, and Abbott went 9-7 went a 4.22 ERA, so their performance sort of generally tracks their expected values. It usually does but not always; that, after all, is the whole point of the exercise, that players you expect to be good sometimes aren’t, and players that you wouldn’t expect to be good sometimes are.

Back to the position players. The “age adjustment” works within a certain range, but there were some cases where it wasn’t sufficient. I added two other age rules, which I call the “extreme age” adjustments:

1) If a player (non-pitcher) is more than 33 years old AND played less than 2,000 games in his career, reduce his expected value for the season by 30%.

2) If a player (non-pitcher) is less than 23 years old, reduce his expected value for the season by 15%.

The first rule above was tremendously helpful in getting more realistic values for that class of players. Let’s talk about. .. .Bobby Avila when he was 34 years old. Bobby Avila played 1,300 games in his career, and he was really good until he was 30.

Without this rule, our Expected Values for Avila would be 13 when he was 30 years old, and 11 when he was 34 years old. When he was 30 he hit .341 with 15 homers; when he was 34 he hit .253 with 5 homers. Our ratio is off. With this rule, our ratio becomes 13 to 8—which is still off, but it’s better. It’s almost always better. In doing this research I found virtually no cases of players who had short careers who had prime seasons later than age 33. There are cases where this rule hurts us—that is, leads us to make inappropriate adjustments—like Bill Bruton and Joe DiMaggio. Bruton and DiMaggio had short careers but had prime seasons late in life, so we would be better off (in a few seasons) if we didn’t have to apply this adjustment.

But there are a handful of cases where this adjustment works against us and 200 where it helps us, so on the whole, I’m happy with it. The other adjustment, for non-pitchers aged 22 or younger, is a smaller adjustment because it is based on a less reliable generalization. In general, however, I had too-high expectations for those few players who were regulars at ages 22 and less. Barry Bonds in 1987; Carl Yastrzemski in 1961 and 1962—good players, yes, but not that close to being the players that they would later be. I put the adjustment in because, working with the data, I became aware that the adjustment was needed. Sometimes it gets in the way—for example, in 1961 we discount expectations for Vada Pinson, because Pinson was only 22 years old, but Pinson had his greatest year that year, and the discounting of expectations for him is not really helpful to us.

In looking at our right field group before, then, we had this:

Aaron, 1957 32.98 (Games) + 10 (Hall) = 42.98 * .921 (age adjustment) = 39.53

Reggie, 1968 28.20 (Games) + 10 (Hall) = 38.20 * .863 (age adjustment) = 32.97

Brunansky, 1987 18.00 (Games) * 1.032 (age adjustment) = 18.57

Ollie, 1966 12.21 (Games) * .863 (age adjustment) = 10.54

Gullic, 1930 1.96 (Games) * .921 (age adjustment) = 1.81

Reggie Jackson in 1968 and Ollie Brown in 1966, however, were 22 years old. We’re going to adjust their expected values down by .15 because they were so young. That makes the values for these players:

Aaron, 1957 39.53 39.53

Reggie, 1968 32.97 * .85 = 28.02

Bruno, 1987 18.57 18.57

Ollie, 1966 10.54 * .85 = 8.96

Gullic, 1930 1.81 1.81

Which we convert to integers:

Hank Aaron, 1957 40

Reggie Jackson, 1968 28

Tom Brunansky, 1987 19

Ollie Brown, 1966 9

Ted Gullic, 1930 2

This is the end of the process for these players, and these are their actual plate appearances, OPS and triple-crown stats in those seasons:

Player, Year Ex Value PA OPS HR RBI Avg

Hank Aaron, 1957 40 675 .978 44 132 .322

Reggie Jackson, 1968 28 614 .768 29 74 .250

Tom Brunansky, 1987 19 614 .841 32 85 .259

Ollie Brown, 1966 9 386 .622 7 33 .233

Ted Gullic, 1930 2 347 .655 4 44 .250

There are two more rules that I need to tell you about here. . .or maybe I should have told you about them earlier; who knows. Anyway, the two rules are:

1) Players older than 40 are entered as being 40 years old, and

2) The maximum expected value for any player in any season is 40.

Aaron at his peak would have a value of 47 or something, but. ..we cap it at 40. We’re not trying to answer the question “Who should have been the greatest player in history, in which season?” We’re just trying to get numbers that represent high values for great players, good values for good players, and limited values for limited players; that’s all. I didn’t want a list of the highest-ranked players of all time; it’s a distraction from what we’re trying to do.

So far I have completed evaluations for 250 teams, plus 1500 to 2000 miscellaneous individual seasons not part of a completed team. In doing that I have found one player, but only one, who, despite playing enough to be listed as the team’s regular in that slot, scores at “zero” for his expected value. That one player was Tony Daniels, second baseman for the 1945 Philadelphia Phillies. Daniels played only 76 games in his career, all of them in 1945, and was only 20 years old at that time. A career that short, that far off-prime, you figure he probably wasn’t too good. We figure his value as 0.76 * .716 (age adjustment) * .85 (extreme age adjustment), and we get 0.4624, which rounds off to zero. Daniels hit .200 with no homers, 10 RBI in 76 games, so he lived up to expectations.

Between the extremes of Tony Daniels and Hank Aaron, we could chart the values in this way:

Weak 0 to 9

Solid 10 to 19

Star   20 to 29

Superstar 30 to 40

And sometimes I sub-divide those into three groups (Weak Minus, Weak, Weak Plus, Solid Minus, Solid, Solid Plus, etc.) for the purpose of discussing and comparing teams. The season expected values, in other words, are sort of on the same scale as Win Shares.

Catchers and Relievers

The method as I have explained it so far is the complete system for all players other than catchers and closers. I have a spreadsheet that represents each team in major league history (not all filled in, of course), and on that spreadsheet there are slots for each team for a catcher, a first baseman, etc., four starting pitchers and a closer. The system as I have explained it so far is the same for all players, except that we do not apply the “extreme age” adjustments to the players in the pitching slots. Otherwise, pitcher or hitter, it’s the same system, it doesn’t matter.

But catchers have short careers, so my values for catchers were off, so I had to deal with that. I did that by crediting catchers with one point for every 75 games played, rather than every 100 games played. Also in the catcher slots, for the “extreme age” adjustment, a short career was considered to be 1,300 games, rather than 2,000. Otherwise, same system.

The system for closers is different in three ways:

1) Closers get one point for each 40 games played, rather than one for each 100 games played.

2) Closers get one point for each 350 innings pitched, rather than one for each 250 innings pitched.

3) Closers get no points for having career won-lost records over .500.

Otherwise the system is the same. The closers get the 10 points for being in the Hall of Fame, the 3 points for an MVP season sometime in their careers, the age adjustments. …all of that is the same.

In all candor, the system is more problematic for closers than it is for any other group of players. In general, the system still works. It gives the highest values for closers like Gossage, Fingers and Eckersley who had long careers and are in the Hall of Fame. It gives very low values to pitchers like Jeff Kunkel, Craig Anderson and Al Severinsen who had short careers but who somehow found themselves listed on the “closer” line when I analyzed their teams. It generally strings pitchers out between those extremes in a reasonable way.

But there are relief pitchers who have long careers although they’re really not all that stupenditerrific, and often those guys spend a year or two somewhere as a closer. We wind up with a system that gives 21 points to the 1989 San Francisco Giants because their closer was Craig Lefferts, and Lefferts had a long career. It’s not a huge problem, but there are more troublesome computations in the closers slot than elsewhere in our system.

Assembling the team estimates

The question we are addressing here is “How good a team should this have been?” How good a team should the 1973 Oakland A’s have been, compared to the 1961 Baltimore Orioles, or the 1986 Red Sox? We are working toward answers to those questions, and so far I have explained how to get answers for individual players, but not for teams. Now let’s talk about how to approach the teams.

I made no estimates for 19^th century teams. Nineteenth century baseball was not major league baseball in any meaningful sense, regardless of what people may tell you, and I’m generally going to start ignoring the 19^th century, and encouraging other people to do the same. Also, I did not include the Federal League (1914-1915) since our method obviously would not work well for those teams, and few people have an abiding interest in how good the 1915 Kansas City Packers should have been.

Anyway, I set up a spreadsheet that has data for all teams since 1900. The teams from 1900 to 1945 have “slots” for twelve players—a catcher, a first baseman, a second baseman, a third baseman, a shortstop, a left fielder, a center fielder, a right fielder, and four starting pitchers.

In deciding which players get the team’s slots, it is much more important to list the players who have the most playing time overall than it is to list the players with the most games at the position. Suppose that, at second base, one player plays 70 games, which is more than anyone else on the team, but another player plays 50 games at second base, 40 at third base, 35 games in right field, and 140 games all together, while the player who plays 70 games at second base plays only in those 70 games. The player who gets the slot is the utility player who plays 140 games total, of course, because our main goal here is to represent the strength of the team, not the strength of the position.

Maybe we’ll have a team on which one pitcher makes 25 starts and pitches 140 innings, while another makes 22 starts but pitches 180 innings. Who do you list as the fourth starter? The player with 180 innings, of course.

Occasionally there’s a close call as to who should be listed as the regular, but the interesting thing is, it never makes any difference. You think it might make a big difference whether you list Carlton Fisk as the starting catcher or Tim Blackwell, but it never works out that way in the real world, because if there’s a 30-point player on the one hand and a 4-point player on the other, the 30-point player always gets more playing time. When it’s a too-close-to-call situation, it’s virtually always a matter of one guy’s a “7” and the other guy is a “6”, and it doesn’t really matter which one you list as the regular anyway.

No teams in 1900 had bullpens and certainly none had closers, but gradually teams began to add them. How do we deal with that?

Beginning in 1946, I added a the 13^th slot for each team—Relief Ace. (Relief Aces did not become Closers until the 1980s. Until about 1980, a relief ace’s job was not defined by pitching the last inning of a win, thus it wouldn’t have made sense to call them “closers”, and nobody did.)

Then the Designated Hitter was added to the American League teams in 1973, so we added a slot for them. From 1900 through 2008 there are 2,204 teams in major league history:

728 which had 12 regulars (no relief ace and no DH),

980 which had 13 regulars (a relief ace, but no DH), and

496 which had 14 regulars (a relief ace AND a DH).

How do you compare one of these teams to another?

For those teams that I have been able to include in my study, I figured the average “position value”. .. .alright, position expected value. …and then multiplied that times 13 for all teams, regardless of whether they had 12, 13 or 14 slots. In other words, the “team total” for teams with 12 regulars is multiplied by 13/12, to increase it slightly, and the team total for teams with 14 regulars is multiplied by 13/14. My logic is, if a team has a designated hitter, that doesn’t make them better than a team that doesn’t have a DH. It just spreads around the responsibility to score runs. Nine players have the responsibility that used to belong to eight.

The relief ace is sort of the same. If we compare the teams that did have a relief ace to those that didn’t, the responsibility to prevent runs is the same; it is merely shared by more pitchers.

Of course, it is not exactly the same, because teams either have a DH or they don’t, and the teams which have Designated Hitters compete with other teams that have Designated Hitters; you don’t have DH teams playing non-DH teams. With the Relief Ace we don’t have such a clean line between “closer” teams and “non-closer” teams. There are teams prior to 1946 that did have relief aces; there are teams after 1946 that really had no bullpen. There were teams that had relief aces that competed with teams that didn’t—and, of course, after a few years the “bullpen” starts to multiply, more bullpen roles, more bullpen roles, more bullpen roles. It’s not entirely clear how we should deal with this. I made a decision about how to handle it, but there’s nothing about it in the notes from Mt. Sinai.

Anyway, let’s compare five teams by our method: the 1961 Baltimore Orioles, the 1961 Cincinnati Reds, the 1970 Minnesota Twins, the 1973 Oakland A’s, and the 1986 Red Sox. In terms of what they accomplished, all of these teams are similar, all winning 93 to 98 games.

1961 Cincinnati 93-61 .604 Lost World Series

1961 Baltimore 95-67 .586 Finished third

1970 Minnesota 98-64 .605 Won Division

1973 Oakland 94-68 .570 Won World Series

1986 Boston 95-66 .580 Lost World Series

But in terms of their underlying talent, as we’ll see, the teams are very different. These are their third basemen, and how they score. Gene Freese, third baseman for the 1961 Reds, played 1,115 games in his career, and was 27 years old in 1961. That scores, by our system, at “12”:

Year Team Third Base G Age Other Score

1961 Cincinnati Gene Freese 1,115 27 12

1961 Orioles Brooks Robinson 2,896 24 HOF 38

1970 Twins Killebrew 2,445 34 HOF 30

1973 A’s Sal Bando 2,019 29 21

1985 Red Sox Wade Boggs 2,440 28 HOF 36

Three of these teams have Hall of Fame third basemen; Killebrew scores a little lower because he was older at the time in question. The A’s third baseman was Sal Bando, who was a good player if not quite a Hall of Famer, and the Reds’ third baseman was Gene Freese, who was not a player of the same caliber (although he had a good year with the bat in 1961.)

This chart summarizes the personnel on the five teams, with bold face noting the superstars:

YEAR	Team	C	1B	2B	3B	SS	LF	CF	RF	DH	S1	S2	S3	S4	RA
1961	Reds	7	8	15	12	11	12	18	38		9	13	10	1	12
1961	Orioles	17	10	11	38	13	14	13	7		20	13	10	4	23
1970	Twins	12	9	6	30	20	7	16	18		34	19	21	1	19
1973	A's	13	16	12	21	23	16	12	40	11	30	25	20	10	40
1986	Red Sox	14	18	10	36	8	28	14	22	16	34	18	8	5	21

I generally listed a team’s best pitcher as the #1 starter, but it doesn’t really matter which spot a starting pitcher goes into, and sometimes they’re scrambled. When you total the position values up, you get this:

YEAR	Team	C	1B	2B	3B	SS	LF	CF	RF	DH	S1	S2	S3	S4	RA	Total
1961	Reds	7	8	15	12	11	12	18	38		9	13	10	1	12	166
1961	Orioles	17	10	11	38	13	14	13	7		20	13	10	4	23	193
1970	Twins	12	9	6	30	20	7	16	18		34	19	21	1	19	212
1973	A's	13	16	12	21	23	16	12	40	11	30	25	20	10	40	289
1986	Red Sox	14	18	10	36	8	28	14	22	16	34	18	8	5	21	252

When you adjust for the DH rule, that draws the 1973 A’s and 1986 Red Sox back toward the other teams—but still, when you look at the personnel on the teams, it becomes apparent that they’re really not comparable. The 1973 A’s were much, much stronger, in terms of who was on the team, than the other teams, and in particular were almost beyond comparison to the 1961 Cincinnati Reds:

Catcher Ray Fosse 13 Jerry Zimmerman 7 Advantage A’s

First Base Gene Tenace 16 Gordy Coleman 8 Advantage A’s

Second Base Dick Green 12 Don Blasingame 15 Advantage Reds

Third Base Sal Bando 21 Gene Freese 12 Advantage A’s

Shortstop Campaneris 23 Eddie Kasko 11 Advantage A’s

Left Field Joe Rudi 16 Wally Post 12 Advantage A’s

Center Field Bill North 12 Vada Pinson 18 Advantage Reds

Right Field Reggie 40 Frank Robinson 38 Advantage A’s

#1 Starter Catfish 30 Joey Jay 9 Advantage A’s

#2 Starter Vida Blue 25 Bob Purkey 13 Advantage A’s

#3 Starter Holtzman 20 Jim O’Toole 10 Advantage A’s

#4 Starter Blue Moon 10 Ken Hunt 1 Advantage A’s

Closer Rollie Fingers 40 Jim Brosnan 12 Advantage A’s

In right field both teams had the MVP in the season in question. Reggie ranks a couple of points ahead of Frank in terms of “expected value” (40-38) because Reggie was 27 years old that year, whereas Frank Robinson was 25 in 1961. That’s not a real difference, but. . .the A’s are just a lot better. Yes, the winning percentage of the 1961 Reds was 34 points higher than the winning percentage of the 1973 A’s, but the Reds’ immensely over-achieved in 1961, as a team, while the 1973 A’s coasted through the regular season, generally under-achieving. In terms of the talent on the team, the Reds weren’t on the same planet as the A’s.

Problems with the Method

What we are trying to do here—to measure what teams SHOULD have accomplished—is inherently impossible, but I certainly don’t want to use that an excuse to cover my mistakes. The difference between knowledge and bullshit is that knowledge edges forward in small steps, each step providing a platform for the next. Bullshit tries to leap forward, but each leaper must start over at the beginning. What I’ve done here, I would hope, could be used as a basis to do it again, only better.

I alluded in the introduction to the fact that my system is useless in dealing with contemporary teams, and is fairly useless in dealing with teams that played even ten years ago or a little more. The method relies on having a complete picture of the player’s career—which means that you have to wait until years after the fact to see how things sort out. This is a major drawback in my approach. However, if we did do a complete survey of all teams, then at least we could compare teams to other teams in the same year.

The other problems with my method could be summarized in three categories:

1) In some cases our estimates of a player’s expected performance are not as accurate as they could be,

2) Our system ignores the contributions of bench players, and of the bullpen beyond the one reliever,

3) Our system of adjusting for expected production by age is merely a simple heuristic, and could no doubt be improved.

Dealing with the first issue, which is the largest one, I would generalize about it in this way: that about 80% of the time our estimates seem to me to be accurate or reasonably accurate, about 15% of the time they seem somewhat questionable, and about 5% of the time the system doesn’t work and the estimate is just absolutely wrong.

Why does the system sometimes not work? Well, sometimes there are very good players who have short careers. To cite the very worst example that has occurred in my research so far, we list Pete Reiser in 1941 at 6 points, based on Reiser’s youth (he was 22) and his short career. But Reiser in 1941 was a great player, and the very low “expectation number” for him is a hindrance to our analysis.

Doc Cramer had a long career but was not really that good. Joe DiMaggio had a fairly short career, but was quite good.

These estimates could be improved, as long as the enhancements to the method are organized and systematic. We could, for example, credit an extra 400 games played to anyone who was a regular in 1942 and a regular in 1946, but missed the three years in between. That would cover the Joe DiMaggio problem. We could discount Doc Cramer, somehow, by showing that, although he played 2,239 games in his career—about the same number as Al Simmons, Goose Goslin and John Olerud—he was nowhere near the player that Al Simmons was, or Goose Goslin or John Olerud.

We could replace the 10-point Hall of Fame bonus with some other objective measure of a player’s “career quality of performance”. I could have based the system on Win Shares, of course, but that’s not exactly what Win Shares were designed to do, and I hate to create confusion about what Win Shares are supposed to do. But this could have been done better than I have done it.

Sometimes career length doesn’t track with the quality of a player’s career because players were in mid-career when the color line broke. One could systematically adjust for that. For the Bucky Walters case, we could make a rule limiting which players could count their innings pitched as part of the oeuvre of their careers. Babe Ruth’s pitching career reduced his games played because Ruth was both a great pitcher and a great hitter, so that the two competed with one another—but Bucky Walters’ innings pitched are attached to his games played, as an infielder, rather like a tumor. It’s a misrepresentation of his value, in his years as an infielder.

For the closers, I probably should have used. . ..I don’t know. Career Saves doesn’t work, because save totals are so inconstant over time. Maybe I could have used career games finished, or somehow have estimated a “leverage index” for each pitcher. Perhaps I should have counted All-Star game appearances for relievers. There’s a way to do it; I just didn’t find it.

Even the problem with the system working only years after the fact. . . that could be avoided, not in this system but in some other system designed to do the same thing. You could invent a system to evaluate how teams should perform based on the ages and previous credentials of the players on the roster on opening day. You’d have to be a better programmer than me to make that work, but it could be done.

My general point is, most of these glitches that crop up in our evaluations could be avoided, if you wanted to take the time and make the effort to invent a rule that applies in each case to a very limited number of players. I was trying, as I am always trying, to make a system that was simple and cuts directly toward the truth. In the end, my system is not that simple—it takes me ten pages to explain it—and not that direct. But if we started inventing ways to cover all of these little glitches, the system would be labyrinthine and would take me 40 pages to explain. And I would never have written this article, because it would have taken me two years, rather than two months, to get this much of the research done.

COMMENTS (17 Comments, most recent shown first)

hammer2525
Bill, I was not commenting on your article or method. I was responding to James Mohl, disagreeing with him about his OPS suggestion and how I thought it wouldn't work for this type of study because of the relatively small amount of data (I thought) for a player rating. Of course Chipper is better than Davey. I was trying to suggest (and should have been more specific) that using OPS in 4 seasons would favor players who only play well for a couple of years.
1:58 PM Mar 20th

wovenstrap
To jalbright's point, if you just credited people with games played to that juncture, it's not a similar measure at all, for a couple reasons. First, you'd be penalizing young teams, period. The 1983 Phillies would come out as the most talented team of all time, on paper. Second, even if everyone were the same age at a given moment, most players enter the league somewhere between 21 and 24, and therefore there might not be that much difference between Albert Pujols and Aubrey Huff. The key thing about games played is how long you eventually stick around, not how early you got into the league (although that counts too). And third, it would make teams from the same dynasty appear to get radically better with each passing year -- every year, the 1970s Reds would get 10% better like clockwork, just because they were getting older.
11:10 AM Mar 19th

benhurwitz
Bill, wonderful writing as usual. But please remember: it's "Downtown" Ollie Brown.
1:15 PM Mar 18th

jalbright
Win Shares could be a starting point with some of the age and other adjustments made in the article. Besides, you could limit yourself to what the player had done to that point, which would enable you to work with more recent teams and not have the issue of events after the projected year affecting the expectation for that year.
10:27 AM Mar 18th

jrickert
I'm wondering how it was decided that someone was a closer. What was Dennis Eckersley?
Also, what makes a catcher? Does Joe Torre get his value calculated by treating him as a catcher or as a non-catcher?

12:16 AM Mar 18th

wovenstrap
Absolutely brilliant -- an attempt to pin down a player's generalized worth in any given season. I think some of the other comments here are a little too hung up on the details of a player's performance, whereas the idea is just to get a basic thumbnail, as you have done. Instead of worrying about Davey Johnson, ask: "what's a journeyman 2b with 1435 GP at age 30 worth?" THAT tells you a hell of a lot more about what people were expecting from Davey Johnson that year, and it does him the favor of demonstrating just by how much Davey overachieved that year. Also, GP is one of the most available statistics of anyone's career, and it's relatively neutral (park effects are hardly a factor).

8:12 PM Mar 17th

tbell
It seems odd to me to use career length as the basis for assessment of player value here. Not that career length doesn’t correlate well with career value, as Bill says. But the length of a player’s career is determined by perceived value, not actual value.

Which, of course, are also very well correlated. But it would seem to me that the interest of this particular exercise is precisely to identify differences in perceived vs. actual value - to identify players and teams that were perceived to be excellent, but proved not to be quite as good as they were expected to be.

For example, to depend on a player like Brooks Robinson to win as many games for you as Frank Robinson is the kind of mistake that usually leads to disappointment in September. Yet I don’t doubt that most Oriole fans - and front-office people - figured that their team was as well set at third base as any team was at any position.

So, I would guess that for the good Brooks Robinson and the great Frank Robinson to have the exact same rating for 1961 - as Bill’s method here has done - corresponds reasonably well with their perceived value (at least in retrospect).

But certainly not with their actual value, then or ever.
11:34 AM Mar 17th

bjames
Maybe what I should have said is this: that if you were actually to try to develop these alternative ideas, rather than simply pitching them forward, you would find that they do not work in a direct-line fashion. You might be able to develop a series of adjustments to make them work, but. ..that is work.
Not following the point about Chipper Jones and Davey Johnson. Chipper Jones is a much better player than Davey Johnson; our system says that Chipper Jones is a much better player than Davey Johnson. What exactly is the problem?
11:04 AM Mar 17th

Trailbzr
Maybe the explanation why WS won't work will come out in a subsequent episode.
But it seems that averaging Al Kaline's WS in 61,62,64,65 is some kind of measure of his paper value in 63.
8:04 AM Mar 17th

bjames
The OPS+ absolutely would not work at all. . .could not be made to work. There are other approaches that would work, but the devil is in the details. I don't see how Win Shares could be used to address this problem. Perhaps you do, but until you make it work, we don't know.
2:45 AM Mar 17th

jollydodger
This is such a major undertaking....I'm amazed and surprised.
12:02 AM Mar 17th

tbell
This method seems reasonable. But why not use the Career Assessments/"Favorite Toy" methodology to generate an Expected/Established Level of Win Shares for any given player in any given season? Might be simpler and better.
5:19 PM Mar 16th

Trailbzr
Frankjm, not sure what aspect of using Win Shares you're raising...
a) Mr. Mohl's method would use the two previous and next year's stats, but not the current year's. And I'd do the same for WS.
b) Mr. James says WS doesn't discriminate against players on bad (or good) teams.
3:08 PM Mar 16th

hammer2525
I think that too many good players would have lower values because of injuries and too many not so good player would be overvalued. One real good year in four would mean much more than one good year in a career. Chipper Jones for the first and Davey Johnson for the second? Not sure, might be missing your point.
3:03 PM Mar 16th

frankjm
OPS+ doesn't factor in a player's defensive contribution (or baserunning, GDP...).

And win shares are based on how many games the team actually won, which we're kind of trying to get away from here.
2:52 PM Mar 16th

Trailbzr
... or his Win Shares.
12:58 PM Mar 16th

fjm235
Much more complicated than necessary. Why not just take each player that had significant playing time and calculate his weighted average OPS+ (or ERA+) based on the 2 seasons prior and 2 seasons after the year in question?
12:52 PM Mar 16th

Teams on Paper 1

COMMENTS (17 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: