Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Presumptive Value

By Bill James

March 23, 2018

Presumptive Value

"Who Should Have Won?"

What actually happened is, in a certain sense, the Sacred Truth of Sabermetrics. Who should win and who should be good are questions we ask about the future. At the start of the 2019 season we will ask ourselves who should win in 2019, and at the start of a player’s career we ask how good a player he should be. Once the event has happened, we lose almost all interest in these questions, and, within the field of sabermetrics, we lose all interest in these questions. We have no methods to address them, retroactively—and we actively avoid using the tools that we have to predict the future on retroactive questions, because the inevitable consequence of doing that is that it shows that we were wrong, and thus that the tool didn’t work perfectly.

It could be, sometimes, that when we lose interest in these questions, we lose understanding about what has in fact happened.

All of the ideas that I have shared with the world acquire whatever value they have not when they are first ideas or when I first work on them, but after they go through generations of refinements. I have published stuff about this idea before, but over the weekend (March 16-18) I had a conceptual breakthrough that enabled me to move the goalpost on this one—that is, to produce some actual results which are at least sort of interesting. Having done that, I realize—as I normally do—that there are also critical shortcomings in the method, which will have to be dealt with in future evolution of the method.

So this is the idea. Suppose that we ask, about the 1967 season, not "who won, and why?" but "who should have won? Who actually had the most talent on their roster?"

Well. . . how do you measure the amount of talent that a team has on their roster?

Suppose that you identify every player in baseball history as a "10", a "9", and "8", a "7", a "6". . . .or he could be a "0". Some players—many players, actually—have so little impact on their teams that the possession of this "talent", the unit of talent that this player represents, has no meaningful impact on who wins the pennant or who should have won the pennant. Obviously it’s a pyramid; there are more 9s than 10s, more 8s than 9s, more 7s than 8s, etc. This may be obvious, but we are not sorting players based on their performance in any season, but rather, on their career performance.

First question. . . .how many "tens" should their be? I decided that 50 was the best number. I tried 100, I tried 20, but 50 is the number that works. If you do 100, then you include in the top 100 players who are clearly a meaningful notch below the Stan Musial-Ted Williams-Mickey Mantle-Willie Mays-Mike Schmidt-Roger Clemens-Barry Bonds level that represents the highest-impact players. If you do 20 on the top level, it squeezes the pyramid so that the mid-level players have much more than five times as much value as the bottom feeders.

I designed the levels such that

1) There would be a constant ratio between levels; that is, the ratio between the number of players on Level 2 and those on Level 3 would be the same as the ratio between 3 and 4 or the ratio between 4 and 5,

2) The total number of players would be essentially equal to the total number of players in major league history, and

3) There would be 50 players on the highest level, 50 "tens".

I would assume there is only one set of numbers that would fit all of those conditions. In any case this creates 50 players who are designated as "tens", 82 who are "nines", 136 who are "eights", 225 who are "sevens", 371 who are "sixes", etc. There are thousands and thousands of players who are zeroes.

How do we decide who is a 10, who is a 9, etc.? I did this by Win Shares or, since I don’t have a fully organized data base of Win Shares, estimated Win Shares for some players. The players were sorted by the sum of two numbers. The first was Career Win Shares. The second was the player’s highest three-year Win Share total, times 5. These two numbers are generally on the same level, so that Career Value and Peak Value have roughly equal impact in the rankings, or at least this was my intention.

To choose the 50 players who are "tens", I first divided them as 32 position players and 18 pitchers. The 32 position players are four from each position—Nap Lajoie, Eddie Collins, Rogers Hornsby and Joe Morgan at second base, Eddie Mathews, Mike Schmidt, George Brett and Wade Boggs at third base, etc. There were no "tens" at the Designated Hitter position, but I had 51 position players who would be nines, so I chose six from each position plus three designated hitters who would be "nines" (Frank Thomas, David Ortiz and Edgar Martinez.) There were 82 position players who would be "eights", so that was ten at each position plus a couple of DHs, and there were 133 position players who would be "sevens", so that was 16 at each position plus five DHs. Below "7" I just broke off the position counts and sorted players by their score; the next 214 were "sixes", then 344 "fives", etc.

After seeing the results I made just a handful of adjustments. One prefers to use strictly consistent standards for classification, but I had Joe DiMaggio as the fifth center fielder (behind Cobb, Speaker, Mays and Mantle), while Willie McCovey was the fourth first baseman (behind Gehrig, Foxx and Albert Pujols). Thus, DiMaggio was a "nine" and McCovey was a "ten". I felt that that was clearly and unarguably wrong, and I reversed them. DiMaggio (a) was deprived of probably 100 Win Shares or more by World War II, (b) still had a higher score than McCovey, and (c) there is no rule of nature which says that the number of players at the highest level of stardom has to be exactly the same at each position; that was just an organizing decision. So I "fixed" two or three players whose ratings seemed quite certainly to be wrong, but basically I just went with whatever the formulas said, whether I agreed with the value or not. Roy Campanella was an 8; I decided to make him a 9.

The idea is that we can decide how good a team the 1983 Cleveland Indians or the 1966 Pittsburgh Pirates ought to have been, based not on their performance during that season and not on our subjective judgments, but based on the quality of the players on their roster. But then, obviously Stan Musial with the 1960 St. Louis Cardinals is not Stan Musial with the 1948 St. Louis Cardinals. To treat him as the same player, regardless of his age, would lead us to misleading conclusions. The Albert Pujols who plays for the Los Angeles Angels today is nothing like the Albert Pujols of 2006. What do we do about that?

I decided to vary the player’s "presumptive value" in this way:

The player’s presumptive value from ages 25 to 30 was the level assigned to him—10 for Ted Williams and Rickey Henderson, 5 for Placido Polanco, Alex Rios and Mickey Rivers, 1 for Sal Butera, Putsy Caballero, Raul Casanova and Pedro Ciriaco.

At ages 24, 31 and 32, the player’s Presumptive Value is one point less than his "prime" Presumptive Value. A 1-point player has no Presumptive Value when he is not in his age-25-to-30 prime.

At ages 23 and 33, the player’s Presumptive Value is two points less than his prime presumptive value.

At age 22, the player’s Presumptive Value is four points less than his prime presumptive value; at age 21, 6 points less; at age 20, 8 points less; and at age 19, 9 points less (although never less than zero.) A 10-point star has a presumptive value of 1 point if he is in the majors at the age of 19.

After age 33, the player’s Presumptive Value diminishes by one point each season until it reaches zero. . .thus, a "ten" has a presumptive value of zero at age 41, a "nine" has a presumptive value of zero at age 40, an "eight" has a presumptive value of zero at age 39, etc.

Yes, it will sometimes happen that a player who has a presumptive value of zero, because of his age, will have a great season like Ted Williams in 1960. Al Kaline in 1955. . .Kaline was a "nine", but because he was only 20 years old that summer, his presumptive value was "1". You just don’t figure that a 20-year-old player is going to lead the league in hitting with 27 homers and 102 RBI. You don’t count on that. When you are examining the makeup of the team, that’s not inherent; it’s an unexpected outcome. David Ortiz in 2016 was an unexpected outcome. And I can tell you, because I was there: we didn’t expect anything from David in 2016. He had reached the 500-homer plateau at the end of the 2015 season; we thought that was probably his last hurrah, and his value would crater after that. We were wrong, but my point is that when the system says that the presumptive value of a 40-year-old "9" is zero, the system is not wrong. That was what we presumed his value would be.

By reducing the complexities of each player’s skill level to a single number, we can evaluate not what the team DID, but what they might have been expected to do based on who was on the roster and how old he was. Once you have answers to that question, then you can get answers to an entire room full of question to which that one is the doorway. What the greatest overachieving team of the season? What was the greatest overachieving team of all time? What was the underachieving team of the season?

How often does the team that ought to win the pennant, actually win the pennant? 90% of the time, or 50%? If a team underachieves in one season (1966) is it likely that they will underachieve again in the next season (1967)? Or if a team historically underachieves in one season, is it likely that they will show improvement the next season?

If two teams head into the World Series even at 93-69 each, but one of them actually has a stronger roster than the other, does the team with the stronger roster have an advantage in the post season? How does this map to managers? Do great managers, like Earl Weaver and Gene Mauch and Joe Torre, get more out of their roster on a consistent basis than other managers, or is it merely that they have better rosters to work with?

As I said, there is an entire room full of questions there, and obviously it is beyond the scope of this one article to answer all of those questions. There is also another room full of lower-level questions, like "What are the best seasons ever for players who had a presumptive value of zero?" and "What are the worst seasons ever for players who had a presumptive value of ten?" Etc.

So I established Presumptive Values for every player for every season in baseball history, and based on that I figured Presumptive Values for about 60 teams.

All teams from the 1959 season,

The Philadelphia Phillies from 1950 to 1960,

The Dodgers from 1952 to 1962,

The Giants from 1954 to 1964, and

The Reds from 1956 to 1967.

What is that. . .57 teams? I may have done a couple of other one-offs; doesn’t matter. Anyway, just by doing these studies, I do feel that I improved my understanding of several issues. Thus, I do feel that this line of research is worthy of pursuing if I can get a little more time to do it. Here’s what I learned.

Regarding 1959, the system suggests that, while the White Sox won the American League pennant, they were actually a long way from being the most talented team in the league. The Yankees had the most talent—not a surprising conclusion at any level—but also the Tigers and, most surprisingly, the Washington Senators actually had better rosters.

The Senators vs. the White Sox. . .that’s a 72 to 71 thing; you can’t take that too seriously. The system is just recognizing that the Senators are compiling the talent that would lift them into contention two or three years later. But the National League results are much more surprising.

I have written many times that the 1959 Milwaukee Braves were the under-achievers to end all under-achievers, while the 1959 Dodgers won a pennant that they in no way should ever have been expected to win. I believe that what I have written. . ..and what I still believe, actually. . . is that there has never been another case in baseball history in which a team as bad as the 1959 Dodgers has beaten a team as good as the 1959 Braves. I rant about this on a pretty regular basis.

But this system says "WAIT A MINUTE." The system has a shockingly different view of the contest—and it is not that easy to say that the system is wrong. The 1959 Braves had three "tens" on their roster—which is a hell of a thing in itself, since there are only 50 of them in baseball history; the Kansas City Royals have only ever had one, who was George Brett. The 1959 Braves had three tens, and all three of them had what you might say are career-defining seasons; Henry Aaron had perhaps his greatest season, Eddie Mathews perhaps his second-greatest, and Warren Spahn won his usual 21 games.

But when I score each player based on his Presumptive Value and add up the team totals, the Dodgers—who won the pennant—come out as the strongest team in the league, while the Braves stagger in in fourth place.

When you dig into the details it makes a certain amount of sense. (The Cubs, by the way, had BY FAR the worst roster in the major leagues, despite having the MVP, Ernie Banks, with a Presumptive Value of 9. Their team total is 46—less than half of the Dodgers.) Anyway, the system says that I have been giving the Dodgers a pass on their under-achieving players. Maury Wills in 1959 was 26 years old. He started the season in the minor leagues, won the shortstop job in mid-season, and had only 242 at bats.

As the system sees it, Maury Wills was a damned good player—a "seven"—and he was 26 years old, so he should have been in his prime. His presumed value is seven. What he actually produced is. . .well, I don’t know, but it isn’t 7 on a 10-point scale.

Duke Snider is a "9", and Duke Snider in 1959 was only 32 years old. A 32-year-old "nine" has a Presumed Value of "8". OK, he hurt his back in ’58 and this limited his value, but that is about what actually happened, not about what could have been expected to happen.

Or Sandy Koufax and Warren Spahn. Sandy Koufax and Warren Spahn are both "tens"; there are only 18 pitchers in history who are "tens", but it happens that each of these teams has one. However, neither of them is a "ten" in 1959, because neither one is in his prime. Sandy Koufax was 23, and Spahn was 38.

Warren Spahn had more actual value than Koufax, Spahn pitching 292 innings with a 2.96 ERA, and Koufax pitching only 153 innings with a 4.46 ERA. But that is what actually happened; this study is about what could reasonably have been expected to happen. The question is, if you have a pitcher who is a "ten" and you have him only for one season, would you rather have his age-23 season or his age-38 season?

Well, of course you would rather, on average, have his age 23 season. In essence, I have been giving the Dodgers a pass on the fact that Koufax was slow to figure it out, just as I have been giving them a pass on Maury Wills making very slow progress and a pass on Duke Snider’s injury, while I have been holding the Braves responsible for not winning the pennant with Warren Spahn having a high-impact season—while ignoring the fact that Spahn was 38 years old. Actually, says this system, the Dodgers won because they should have won. They had more talent to work with. The thing is that they almost blew it.

More than 40 years ago, I was sitting around at work on a warehouse loading dock, and I got into a debate with my friend Chris Ketzel about the 1950 Phillies. In essence, I was arguing "Why didn’t they ever win again?", while Chris was arguing "How did they even win once?" I was arguing that there was a lot of talent there (Richie Ashburn, Robin Roberts, Curt Simmons, Willie Jones, Granny Hamner, Del Ennis, Andy Seminick), and Chris was arguing that the Dodgers had a lot MORE talent.

This system says quite affirmatively that Chris was right, and I was wrong. Here are the Phillies’ Presumed Value Team Totals from 1950 to 1956:

1950	1951	1952	1953	1954	1955	1956
58	58	68	64	65	72	77

This system argues persuasively that the Phillies’ had a far stronger roster, and a better chance to win, later on in the decade, but that their 1950 pennant really was a. . .fluke may not be the right word. It was a remarkable thing.

The 1950 Phillies were known as the Whiz Kids. They had a group of very young players who played together (and roomed together) in the minor leagues, came to the majors together and won the pennant—actually not unlike the 2016 Chicago Cubs, but more so. This method explains their success to me in a way that I have never really understood it before.

First of all, 58 points of Presumed Value is a very, very low total for a pennant-winning team. That’s a total for a 6^th or 7^th-place team in an eight-team league, not for a pennant winner. Even their total in 1956—77 points—is not enough to make them a favorite; that’s like a .500 or .550 team, basically. But they were able to win for two reasons. One is that a whole bunch of relatively young players had prime seasons before you would expect them to do so. In 1950 Curt Simmons was 21, Richie Ashburn was 23, Robin Roberts was 23, Granny Hamner was 24, Willie Jones was 24, Del Ennis was 25—but they all had prime seasons. That made the team stronger in fact than it was on paper.

And the other thing was, a 33-year-old veteran pitcher who came into the season with little history of success behind him and little future of success ahead of him, Jim Konstanty, won the MVP Award. Those things happened, and then too the Phillies outperformed their Pythagorean projection by 3 games—a small thing, but they needed it—while the Dodgers underperformed; the Dodgers were +123 runs and the Phillies were +98, but the Phillies won more games. Adding it all up, Chris Ketzel was much more right than I was: it was just a miracle season. As the decade progressed the Phillies should have gotten stronger, but in fact they went backward.

In 1954 we reach a similar conclusion; the 1954 Dodgers actually had a much stronger roster than the Giants did, 89 points to 69, although the Giants won the pennant. When I was younger, years ago, there was controversy over how much credit Leo Durocher deserved for winning those pennants with the Giants in ’51 and ’54, controversy driven by the fact that Leo was a self-promoter who was on television and who would tell you flat out that he was a really good manager. But as to the issue of how much credit is due to Leo for those two pennants, this method suggests that the answer is "quite a lot". The Dodgers’ roster was actually quite a bit stronger than the Giants, and of course the Dodgers won more than two pennants. But the Giants probably shouldn’t have won even the two that they did.

By the early 1960s, that’s different; the Giants’ talent base started growing steadily stronger year by year beginning in 1956, and by 1962 they were actually the Dodgers equal or superior. In 1962 I have it San Francisco 99, Dodgers 98. . . .very much the same as the actual outcome.

Understanding baseball history is important to me, so a better understanding of the pennant races of the 1950s is important to me, while I recognize that it is impossibly remote to a younger person. This is going in a really interesting direction, but at this point I am going to terminate this study, because I can now see several things that I should have done differently than I did them. It seems to me that to continue with this work, rather than to go back to the beginning and fill out the boxes a little better, would be a waste of effort. In the next iteration of this study:

I will study "actual value" and "presumptive value" on parallel scales, so that we’ll be able to say who it was who underperformed for teams that underperformed and who overperformed for teams that overperformed. I realize now that it would be fairly easy to do these two things at the same time, thus pretty easy to include that in the study.

Second, I think I’ll weight the "peak value" at four times the sum total of the three best years, rather than five times value as I did here. The "peak value" can pick up on fluke seasons a little bit; multiplying the peak multiples the flukes. At 5X value, the peak value total tends to be almost always larger than the career value total, whereas I had intended for them to be about equal.

Third, I think I’ll change from a "10 to 0" scale to a "20 to 0" scale or maybe 25 to 0, to give me more space to deal with what I will call the Marv Throneberry problem.

Fourth, I may modify the maximum value (into the presumptive value) by a PERCENTAGE relative to age, rather than by subtracting from the maximum, although there is an advantage to the subtraction method. (Good players and great players retain value much longer, thus retain a much higher PERCENTAGE of their value as they age. Lesser players go to zero and disappear, usually by age 33. Great players last until they are 40. The "countdown" system describes that better. I may use a blended/compromise system for aging.)

Fifth, I think I will systematically limit each team to their 16 best players or something. (Teams don’t have 25 players whose presumed value is such that it meaningfully impacts the pennant race. They usually have about 12. After that it is window dressing.)

Sixth, in this version of the study we can only study teams from ten years ago or more, because the ratings for players in mid-career are screwy. Miguel Cabrera in this study is a "9". It is relatively obvious that he should be a "10"; he’s just a few points behind the standard of being a 10, and he will probably pick up those points as he finishes his career. Or Mike Trout; Trout is also a "9", although he obviously will be a "10". Because of this problem I figured I couldn’t do recent teams; I could only do teams from ten years ago or more, but it shouldn’t be that difficult to make adjustments for players in mid-career to get more reasonable figures so that we could evaluate, not the 2017 season, but the 2012 season at least.

Seventh, and most important of the seven changes, I am going to add a third element to the presumptive value estimate for the season, which is the player’s established value coming into the season. This is done to contain the damage of what I will call the Mike McCormick/Vada Pinson problem.

The Marv Throneberry Problem is this. Mickey Mantle and Marv Throneberry are both on the 1959 New York Yankees. Mantle’s Presumptive Value is 10. Throneberry’s is 2.

It’s not exactly the right ratio. Five Marv Throneberrys do not equal one Mickey Mantle. It’s not a crazy comparative value; Marv was a pretty good power hitter. Comparing Mantle to Marv, Throneberry had more than one-fifth as many homers, more than one-fifth as many RBI and more than one-fifth as many runs scored—in fact, more than one-fourth as many. From 1958 to 1960 Throneberry had 568 at bats, and with those 568 at bats he hit 26 homers and drove in 82 runs, drew 60 walks and scored 86 runs. He could hit a little.

Still, his presumptive value in the pennant race is not one-fifth of Mickey Mantle’s, and of course I am not just talking about Marv Throneberry, but about an entire class of players like Marv Throneberry, who had some decent years as bench players and got a shot as regulars but failed; that’s what a "2" is. If Mantle is a "25" and Throneberry is a "3" I’ll be less worried about it.

The Mike McCormick/Vada Pinson problem is this. Mike McCormick was a Bonus Baby pitcher in 1956, a 17-year-old major leaguer, who went on to have a decent career; he led the National League in ERA in 1960 and won a Cy Young Award in 1967. Good career. In our system he is a "5".

However, because he is a "5" in his prime seasons, that means that he is a "zero" when he is 19 and 20 and 21, because you don’t normally expect a player of that quality to have impact on a pennant race when he is 21 years old. He becomes a 5 in 1964.

Well, but McCormick in fact pitched pretty well as an 18-year-old, was a rotation anchor as a 19-year-old (11-8 in 28 starts, also made 14 relief appearances), and pitched 253 innings and led the National League in ERA when he was 21 and his Presumptive Value was zero. When he was 23 and his presumptive value was 3, he pitched 99 innings with a 5.36 ERA, and when he was 25 and his presumptive value was 5, he pitched only 17 innings and did not win a game, 5.29 ERA. That means that McCormick’s Presumptive Value was zero when he was actually pitching 250 innings a year and pitching well (1960-1961), but that his Presumptive Value was five when his actual value was zero.

Vada Pinson was similar; Vada Pinson was an 8 in our system, which is a near-Hall of Fame Level. . .a lot of "eights" are Hall of Famers, and a good many 6s and 7s are Hall of Famers, even a couple of "fives" are Hall of Famers. Anyway, Vada was an 8, but that means that in 1959, when he was 20 years old, his Presumptive Value was zero. In 1959 Vada Pinson hit .316 with 20 homers, 47 doubles and 131 runs scored, not really a zero-value season. In 1961, when he hit .343 and was third in the MVP voting, his Presumptive Value was only 4. He had 205 hits in 1959, 208 in 1961.

Vada was a great player when he was 20 and 22, but he never hit .300 after age 26, and when he was 28 he hit just .271 with 5 homers. By the time he was 30 he was a journeyman. That happens sometimes; it’s not a moral failing, it just happens. But that means that Vada was a great player when his Presumptive Value was zero, but a small-impact player when his Presumptive Value was eight.

In a sense I am not bothered by this; Presumptive Value is not about what the player did, but what you would have expected him to do. You would have expected McCormick to pitch better in 1964; you would have expected Pinson to play better after he was 26. The system is not lying; it is merely recognizing a different truth.

I am not bothered by that for the 1959 season, but I’m a little bothered by it for the 1961 season. Pinson’s 1959 season came out of nowhere and could not have been expected. But by 1961, his performance was not a shock to anyone. The same with McCormick; people were shocked by his performance when he was 19, but by the time he was 21, they knew he was good. This probably should impact how we view his "presumed" value.

So in the next iteration of this study, I think I need to modify the Presumed Value for the season by the player’s Established Value before that season. I don’t know yet quite how I will do that, but I know that I can do that and will figure something out.

There’s another problem here that I don’t know how to deal with; I don’t know what I’m going to do about it. In early 1961 the Los Angeles Dodgers traded Don Demeter to the Phillies for Turk Farrell; there were also a couple of lesser players in the trade. Demeter and Farrell were both pretty good players; in our system Demeter is a "4" and Farrell is a "5", like Mike McCormick who actually had a kind of similar career.

But when I add up the "Presumed Value" for all of the players who were on the 1961 Los Angeles Dodgers or the 1961 Phillies, Demeter and Farrell wind up on both lists. This probably seems to you like a simple problem, but it isn’t a simple problem, I don’t have any idea how I will deal with it, and I suspect that I won’t be able to get rid of that one in the next version of the study. I don’t have any list of transaction dates or any transaction information in the data bases that I use to study these things, so I don’t have any systematic way of removing these conflicts. It would take a year to do them on a case by case basis.

I don’t have any way of recognizing that a player is on two lists and should only be on one, and also, it’s not as simple a problem as you might imagine. If it’s a 4 and 5 I could deal with that if I had some information in my data base that I don’t have, but what if it’s a 4 and a 1? That does happen; teams do trade players who, in retrospect, were nowhere near equal in value. You would have to be careful not to write the rules so that you in effect give a pass to a team which has a "4" on their roster and trades him for a "1".

I need to get back to other work projects. I’ll put this aside and try to get back to it within a month or so. What would be ideal would be if there was some young person who had better programming skills than I have who would take up the problem and carry it into the future. That kind of stuff happens, but it doesn’t happen when you want it to happen; it happens on its own schedule. We’ll hope for the best.