1. The Attribution Problem
In Baseball and In Life
We attribute the victory won by the team to the individual pitcher—and then conclude, based essentially on that attribution, that the pitcher is the key to victory. It sounds silly, but people have been misled by this attribution problem for a hundred years.
In baseball the pitcher attribution problem is relatively simple to understand, if you can just be let go for a moment of what you think you know, and drift along with the flow of the logic. A similar fallacy in baseball is the confusion of run creation from the standpoint of the team with runs batted in from the standpoint of the individual. From an offensive standpoint, which is half of the game, the goal of the team is to generate as many runs possible. The handiest traditional instrument to measure the runs created by each individual is his runs batted in, his RBI total.
The problem is, this often leads to confusion between RBI and actual offensive value—again, a confusion between the individual accomplishment and the good of the team. Most runs result from a sequence of actions by several hitters. The number of runs a team scores depends on how many long sequences they can muster—thus, on how many people they get on base. The RBI count is essentially an indicator of who stands where in the offensive sequence. I’m overstating the case—great RBI men are usually great hitters--but maybe you get my point?
When a new manager or general manager takes over a bad team, what he will very often say is “the thing we need most on this team is an RBI man.” In reality, the vast majority of the time, an RBI man is the last thing that a bad team ought to be worrying about. What they almost always need is more people on base—thus, more people contributing to the sustained sequences of events that lead to three-run innings.
What a manager or general manager is really saying, when he says that “what we need here is an RBI man,” is “we need somebody else here to take credit for the meager successes of this team.” It’s the wrong thing to worry about. If the team is more successful, somebody will get the credit for it, and you can worry later about who.
It’s an attribution problem. Managers, media and fans are prone to attribute to the individual what is actually accomplished by the team—and thus, are prone to recommend changes in personnel that are really of no use at all to the team.
But is the attribution problem a unique problem to baseball? Not at all—in fact, the same sorts of attribution problems occur throughout American life. In America—and certainly in other countries, but I don’t know anything about other countries—in America we are constantly trying to fix something that isn’t really the problem. Children playing unsupervised in city parks. Parents in America in the 1950s routinely allowed children as young as six to play around the neighborhood. There were news stories about child-snatching, more news stories about child-snatching, more grisly and horrific stories about children seized by predators and never again seen alive. Eventually we all quit allowing our children to play around the neighborhood until they were 23.
Am I saying that you are wrong to protect your children? Of course not. You, me, any of us would do and will do everything we can to protect our children.
What I am saying is, addressing the problem from the standpoint of the individual doesn’t really do anything at all to fix the problem from the standpoint of society as a whole. You have just as many people who will victimize children after you do that as you did before. Adding an RBI man to a bad team doesn’t really do anything at all to improve the team’s ability to produce runs. If you add a low-average power hitter to a bad team, the low-average power hitter will lead the team in RBI—and the team will score fewer runs, not more. For essentially the same reason, protecting your own child from child predators doesn’t really do anything at all to reduce the problem of child predators. In fact, in some ways, it makes the problem worse.
How does it make the problem worse? Suppose that you see a small child, a six-year-old child, playing alone in the park. In the 1950s, this would not have been at all unusual; now, it would be extremely unusual. So what do you say to yourself, if you see a young child playing unsupervised?
You think to yourself, “My God, that child shouldn’t be playing in the park without supervision”—even though you yourself may well have done the same thing when you were that age. I certainly did.
“Yes,” you think to yourself, “I did that, but the world has changed.” Bingo. That’s how we’ve made the problem worse. We’ve created the idea that snatching a child out of the park if he is left unsupervised is a sort of normal and natural risk—rather than an extraordinary event. That doesn’t make me or you any more likely to grab a child out of the park, because you and I aren’t going to do that anyway. But to a potential child predator, the idea that this is a sort of normal and natural risk is a form of permission. In our world, an unsupervised child is a sort of advertisement for a child molester. It shouldn’t be that way. Children playing should be a normal sight—even if they are unsupervised.
This syndrome of changing public behavior for no real public benefit has been repeated in many different areas. I remember once, when I was maybe eight years old, we were driving home from the nearby town, my nine-year-old sister and I in the car. We happened across two black men whose car had broken down, and who were hitchhiking into town. My father was something of a racist, not a virulent racist, but. . .he wasn’t Spencer Tracy. All the men of that generation that I knew were somewhat racist—yet he stopped to pick up the hitchhikers.
Why? Because, at that time, you just did. You saw somebody in need of a ride, you gave them a ride. That’s the way it was.
That was maybe 1958. Within a year, I remember reading news stories in the paper, Altoona Man Killed by Hitchhiker, Tennessee Woman Assaulted, Slain by Hitchhiking Soldiers. “Assault” was newspaper code, at that time, for rape. Ann Landers began counseling her readers: For heaven’s sake, don’t pick up hitchhikers. You never know what they might be up to. By the early 1960s radio announcers were warning people not to pick up hitchhikers. By the late 1960s there were public service announcements from the police, pleading with people not to pick up hitchhikers—although I found myself, from time to time, sticking up a thumb by the side of the road. I remember in the early 1970s my brother-in-law Ned West would still stop and pick up hitchhikers, and my sister Rosalie, his wife, would get furious about this; they’re both dead now, and a hitchhiker didn’t kill either one of them. By the late 1970s hitchhiking was nearly extinct.
Of course there are other factors in this. A lot of young people hitchhiked in the 1950s because that was the only way they could get around. I knew people in the 1950s who worked ten miles from their home, and hitchhiked to work every day. Everybody has a car now, and the cars are vastly more reliable than they used to be; used to break down by the side of the road regular as rain.
But my point is, eliminating the practice of hitchhiking didn’t really do anything at all to reduce the incidence of violent crime. Society derived no benefit whatsoever from this change. There are, in the world, a certain number of violent people who commit random crimes until they are arrested and put away. Whether people pick up hitchhikers or whether they don’t, that number is exactly the same, and the number of crimes they are going to commit before the short arm of the law catches up with them is essentially the same.
I don’t pick up hitchhikers either; what do you think, I’m crazy? We all acted rationally to protect ourselves—yet the chance that we would be killed by some random nut was the same afterward as it was before. In the same way that adding an RBI man to a bad lineup simply changes who gets credit for creating the runs, eliminating hitchhiking simply changes the details of the crimes which are committed. If you started picking up random strangers, somebody would probably rob you or worse within six months—but if we all started doing it together, we would all be just as safe as we are now.
Here’s another example of the principle of miss-attribution to the individual. It used to be that, when people didn’t feel well, they went to the hospital. A mother, after delivering a baby, might spend a week in the hospital, getting a good night’s sleep every night, being taken care of by nurses, and letting her body recover. It might have cost $5 a night to stay in a hospital—literally—and there are still women around who remember how much they enjoyed that week, relaxing and letting other people worry about the kids.
When I was in the third grade I had my tonsils out. I spent three nights in the hospital. We were paupers, but. . .if you needed medical care, you went to the hospital.
Hospital costs began to rise. Hospitals began charging more for a night in the hospital. The “bed” wasn’t really what you were paying for; it was just an accounting mechanism. You were actually paying for malpractice insurance, drug research, drug representatives, medical school and hi-tech equipment, but this was all paid for on a per-bed basis.
So the insurance companies, to reduce their costs, began pushing people to spend fewer nights in the hospital. A week spent recovering from childbirth became four days, became three days, two days. . .eventually the government had to step in and legislate at least one day, or the insurance companies would have insisted that you drop the kid and run. The three-day hospital stays to have your tonsils out dropped to about a half and hour.
As the number of the “beds” sold by the hospital diminished, the price for each bed had to increase. . .$50 a night, a hundred, two hundred; eventually it cost hundreds or even thousands of dollars to rent a hospital bed for one lousy night. When this no longer served to defray the hospital’s costs, they began charging fantastic fees for other things. An aspirin from a hospital in the 1950s was free, then it cost a quarter, 50 cents, a dollar, $20. God knows what an aspirin costs anymore; only God could afford one.
Of course, reducing the number of hospital beds the insurance company is paying for does nothing whatsoever to reduce the costs of medicine, because you’re not really paying for the bed. That’s just an accounting trick. Like RBI. The health care system was being fooled by its own accounting tricks.
If one insurance company stopped paying attention to the cost of a hospital bed, that insurance company would go bankrupt in a matter of weeks. But if all the insurance companies stopped paying attention to the cost of a hospital bed at the same time, it would make virtually no difference to any of them, since the overall costs of medicine, which is what the companies are really paying for, would be spread among the insurance companies the same afterward as before. In fact, the focus on the cost of a “bed” makes the problem worse, in this way: that if people could afford to go the hospital when they don’t feel well, they would. If a hospital bed had a reasonable cost, sick people would check themselves in. But now, you don’t to the hospital unless you’re desperate—thus, the number of people sharing the cost of medicine is artificially reduced, and the cost is pushed artificially higher once again.
And, because the number of beds used is reduced, it no longer makes sense for the hospitals to maintain beds. What would happen if there was a serious pandemic in which people, for some reason, needed to be kept near their doctors? There are no longer hospital beds in which to treat people.
The miss-attribution problem is not causing hospital costs to explode. Hospital costs are exploding for other reasons. But miss-attributing those costs to beds—or to aspirin, or whatever—is making the problem worse.
Yesterday I got this question in the “Hey, Bill” section. . .a coincidence, in that I had already written the above:
As you have college age kids, I wondered if you have thoughts on the costs of higher education. I think we should contract some of the college departments that offer programs with little hope of leading to future employment, but my concern about that is the departments in demand today will not necessarily be the same as what will be in demand in 30 years. What do you think? And how much should we expect our students to fund their own education? Is a system that charges $50,000 in tuition annually but offers a lot of financial aid a good thing? In the public colleges, what should be the ratio of student:state funding?
The question from Michael Kirlin. As to the question of whether I have any thoughts on the costs of higher education, the answer is: Will the government please stop trying to make higher education more affordable?
It’s the same problem. . ..and understand, I am speaking as someone who could never possibly have gone to college without the help of the government, and I am extremely grateful for that help. If there is a day in my life when I forget to be grateful for that, I should be ashamed of myself.
But it’s the same problem; what is good policy on the individual level is insane government policy on the macro level. College is expensive. People need help to afford it. The government steps in to provide help.
What this does it, it increases the amount of money available to purchase higher education—which causes the cost of higher education to rise, which makes college more unaffordable to more people. The demand for higher education rises; the price increases—so the government needs to do more, and more, and more. We allocate ever more money to help more people go to college, and the cost of a college education goes higher, and higher, and higher. Every dollar that is spent to make college education more affordable makes it more unaffordable. The same with health care. Every dollar the government spends to make health care more affordable increases the cost of health care by one dollar—but we’re trapped in a cycle of trying to do more, trying to do more, trying to do more.
Everything that government spends money to make more affordable, without exception, becomes dramatically more expensive. What one can do to make college more affordable is exactly the opposite of that suggested by Mr. Kirlin: work to increase the supply. If the government spends money to increase the supply of higher education—which, in fairness to the government, it does—that does make education more affordable. Working to increase the supply of education is like increasing the team on base percentage. Working to make education more affordable to individuals is like trying to find more RBI men.
One more example, and I’ll stop. Public Stadium financing. Let us say that Alston, Brighton, Cambridge and Danvers are all cities of 1.8 million each, and they would all like to have an NFL team, which at the present Cambridge and Danvers do by Alston and Brighton do not. Alston thinks “To attract an NFL team, we need to build a $400 million stadium,” and so they do. But whose NFL team do they attract? The voters of Cambridge are in effect told “We have an offer of a $400 million stadium. Build us a $500 million stadium, or we’re leaving.” Brighton wants in on the action, so they build a $600 million stadium.
The only thing is, after billions of dollars are spent to build stadiums, there are no more teams than there were before. What is rational policy for each city is utterly irrational from the standpoint of all the cities—spending taxpayer money to enrich the NFL.
Kay Barnes was elected Mayor of Kansas City about ten years ago, on a program of “let’s build a beautiful new arena, and attract an NBA team to come back to Kansas City.” She wasn’t a bad mayor, really. We have the beautiful new arena now. The Kansas City Innovators, maybe you’re heard of them? I think they’re in the NBA finals now.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2. Doubles and Career Length
I apologize for posting this here; this is a response to comments posted by readers in response to last week’s blog. I attempted at least five times to post this in the comments section there, where it belonged, but the computer absolutely refused to accept it, so I gave up and moved it here.
On the possibility of someone breaking the doubles’ record, there is an argument made that this is less likely because more players now go to college before entering pro baseball, which shortens their careers. That may be true on one level: there may be more players now who go to college, and it may be that college players are less likely to have very long careers.
But the number of players having very long careers isn’t going down; it’s going up, and going up very rapidly. To break the doubles record, you would need to play at least 2,500 career games. (Paul Waner, who played 2,549 games, had 3,152 hits, and Sam Crawford, who holds the career triples record, played 2,517 games.)
In all of baseball history up to 1970 there were only ten retired players who had played in 2,500 games (plus two active in 1970 who were past that mark.) Since 1970 there have been 37 players who played 2500 or more games—8 who retired in the 1970s, 9 who retired in the 1980s, 9 in the 1990s, and 11 who have played since 2000. Make it 12; I think Gary Sheffield just moved past that number. More players active since 2000 have played 2,500 career games than in all of baseball history up to 1968.
The number of players having very long careers responds to many external variables—college, perhaps, but also the length of the schedule, the DH rule, expansion, financial incentives to keep playing, and better health care. The balance of these factors is not reducing the number of players who play long enough to hit 800 doubles; it is, in fact, dramatically increasing it.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3. The Sandberg Game
How many of you got to see MLB-TV’s re-broadcast of “The Sandberg Game”, the June 23, 1983 game in which Willie McGee hit for the cycle and drove in 6 runs, but Sandberg went 5-for-6 with two homers and 7 RBI? It’s a great game, and it’s a lot of fun to watch. A few notes about it:
1) Do you notice the dust in the infield? If you ever see films of Maury Wills sliding into second base, you often see a huge cloud of dust rising up around second when he slides. Twenty years later (1983) there is less dust around the bases than there would have been in Wills’ time, but still more than there is now. It’s interesting that we have just gradually gotten rid of that, without focusing on it.
When did they start sprinkling the base areas in the middle of the game? I don’t know; I know they were doing that by the late 1970s. I think it started with the artificial-turf stadiums of the seventies, which had no dirt except around the bases, and I believe that they started sprinkling the dirt around the bases so that it wouldn’t blow onto the artificial turf and get the turf dirty.
In fact, weren’t there some artificial turf stadiums, in the late 1960s, which had artificial turf even around the bases? Did the Astrodome, when it opened, have dirt areas around the bases? Does anyone know?
2) Ralph Citarella was making his first major league start. When he left the game he was up 9-3, in position to win, but the Cardinal bullpen let it get away. When the Cubs tied the score Bob Costas observes that “So Ralph Citarella’s first major league win will have to wait for another day.” That day would never come. Citarella never won a game.
3) Costas raves throughout the game about Ozzie Smith’s defense, but to a modern viewer the plays look really ordinary. At one point there’s a ground ball up the middle, Ozzie fields it, and Bob says “Keith Moreland has to be thinking that if there’s anybody else playing shortstop, he’s got a base hit.” But the play is two steps on the shortstop side of second base, not hit all that hard, and you’d expect any modern shortstop to make that play. I wonder if Ozzie was. . .and shortstops were. . .more over in the hole than they are now? Don’t quite get it. Maybe it is just this game; if I watched a different game, maybe I’d think Ozzie was incredible.
4) Man, everybody is skinny.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Nostalgia is one of the greatest enemies of the truth.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
4. Explaining Win Shares and Loss Shares
I have been held up explaining the methodology of Win Shares and Loss Shares by three things:
1) I needed to find time to re-think and re-work the “Saves Approximation of Leverage Index”, which appeared to be too small,
2) It’s boring. It’s boring to write, and it will be somewhat boring to read, and
3) It takes an awful lot of time.
I made a couple of adjustments to the Leverage element for saves, so we’re good to go on that, and I think what I’m going to do is explain it one piece at a time in these Monday morning blogs. Then, when (and if) I get the whole thing explained, we can pull out the pieces and put them together into one explanation. Not that anybody is going to want to read it. The data is more interesting than the method.
OK, the Win Shares method for hitters (for hitting) can be reduced to seven steps:
1) Figure how many runs the player has created,
2) Figure how many outs he has made,
3) Evaluate his offensive context,
4) State this as a productivity ratio,
5) Figure an offensive winning percentage,
6) Assign him a number of games that he is responsible for,
7) Multiple the offensive winning percentage times the games for which he is responsible.
1. Figuring Runs Created
There are many runs created formulas, as you know, and it doesn’t matter a whole lot which one you use. This is what I am using:
A factor: Hits + Walks + Hit By Pitch – Caught Stealing – Grounded Into Double Play
B factor: Singles times 1.125, + Doubles times 1.69, + Triples times 3.02, + Home Runs times 3.73, + (Walks + Hit Batsmen - Intentional Walks) times .29, + (Sacrifice Hits + Sacrifice Flies + Stolen Bases) times .492, - Strikeouts times .04.
C factor: At bats + Walks + Hit Batsmen + Sacrifice Hits + Sacrifice Flies
Those are the A, B and C factors used back to 1955, and then for earlier years we use different formulas which I’ve explained in other places and won’t repeat here.
We then modify the A, B and C factors in this way:
A = A + 2.4C
B = B + 3C
C = 9C
Runs Created are then:
A * B / C - .9 C
The reason for these modifications is to keep the player’s runs created elements from interacting with one another, and force them to interact as if with other hitters. If you take a group of nine hitters and figure their individual runs created, without these modifications, then the sum of the individual runs created will be higher than the runs created by the group. But with these modifications the sum of the individuals will be the same as the runs created by the group (without the modifications on the team level), or very nearly so.
In the Win Shares system I modified Runs Created by adjusting for
1) Batting average with runners in scoring position, and
2) Home run frequency with men on base.
But in Win Shares and Loss Shares I didn’t do this. It’s not that I feel that these adjustments were improper; there’s a good argument to be made on behalf of those adjustments. But the adjustments are debatable, and I was looking for ways to simplify the system, so I threw those out. I didn’t ultimately simplify the system, of course; I eliminated 100 wrinkles and added 200. But I was trying to keep the system cogent and intelligible.
2. Figure Outs Made
Just at bats minus hits, plus sacrifice hits and flies, plus caught stealing, plus grounded into double play.
I think maybe in the Win Shares system I made some team-level adjustment to reconcile the individual outs made to the team’s offensive innings, but again, I dropped that here, in an effort to streamline the system.
3. Evaluate his Offensive Context
We have to start by figuring the Park Factor, which is
(R + OR) / G at home
---------------------------
(R + OR) / G on the road
And we’ll call the Park Factor “P” in the next equation, and the number of teams in the league L. We change the Park Factor into an Applied Park Factor by the formula:
P * (N-1) + N-1
-------------------
2* (N – P)
I call these the Park Factor and the Applied Park Factor; other people call them the Park Index and the Park Factor. Whatever. This is a kind of standard methodology for changing a raw park factor into an applied park factor; I think it was first invented by Pete Palmer, and then has been re-invented by many other people. It just basically says that if your park increases run scoring by 10%, that increases your runs scored by only 5%, since you only play half your games at home, only it’s a little more than 5%, because you don’t have your own home parks counted among your road parks. If you get it you get it and if you don’t you don’t, but this isn’t the place to explain it, and I’m going to move on.
The offensive context for the players on a team is the league runs per game, times the Applied Park Factor.
In Win Shares I figured Park Factors by a rolling multi-year average (except for park changes), but that’s an accounting nightmare and it is questionable whether it does more good than harm, so I got rid of that, and just use the single-year park factors.
(To explain more than is probably appropriate, if you take 154 games or 162 games and divide them into two sets of 81 or 77 games, the two sets will not be identical. Your team may score 400 runs in one set and 350 in the other, not for any REASON, but just because everything won’t even out. It can happen that a team scores more runs at home than on the road, not for any real reason, but just because they happen to have more good days at home than on the road. A 162-game split is not nearly large enough to remove these random perturbations in the data.
When you make one-year park factor evaluations, there are a handful of cases in baseball history where this causes serious distortions. Perhaps the most obvious one is Fenway Park in 1955. The Red Sox in 1955 scored 470 runs and allowed 395 at home, whereas they scored 285 and allowed 257 on the road. This causes a very large measured Park Factor for the Red Sox in 1955, which causes screwy calculations of individual player value. For example, you may have seen (in my work and in other people’s) the 1955 Boston Red Sox evaluated as having the best pitching staff of any major league baseball team between 1940 and 1980, or something like that, and Frank Sullivan, 1955, evaluated as being better, park-adjusted, than Sandy Koufax in 1963.
Of course, Frank Sullivan in 1955 was not the best major league pitcher of the 1950s, and the Red Sox in 1955 did not have the best pitching staff of the 1950s. It’s a random data glitch, in which it happened that the offense had almost all of their best days at home, and the pitchers had almost all of their best days on the road.
You can contain the impact of this random data split by using a multi-year park effect, basing the estimate for the Red Sox in 1955 on their home/road data from 1953 to 1957, rather than just on 1955. That is more accurate—in this particular case.
The problem is,
1) It is not necessarily more accurate in other cases, and
2) Even when it is more accurate, it’s a very large amount of work for a relatively small gain.
So I just decided to skip it here, and use the one-year park effect. I may actually intervene in the data (in this case and a few others) to make a “common sense adjustment.” But we’ll leave that issue for another day.
But there’s another adjustment.
And this is new. One of the differences between evaluative systems is in the practice of establishing the norm. Some evaluative systems compare the hitter to other hitters playing the same position; in other words, the “league norm” for a first baseman is what the other first basemen in the league hit. The average first baseman is an average hitter for a first baseman.
I am strongly opposed to this practice, for two reasons. First, it creates very small normative groups, which creates pockets in the data with misleading norms. If you look at the data for Hal Trosky, for example; in his early years he is competing, in an eight-team league, with Lou Gehrig, Jimmie Foxx and Hank Greenberg. This leads to a very high expectation for runs created for a first baseman in the American League in 1933-1937, which causes all of them to be ranked lower than they should be. That’s an extreme example, but it’s a common problem; the norms for leagues are frequently non-representative of the underlying forces that are shaping the game.
And second, in my view, it is simply not true. That method is a way of saying, in essence, that first basemen are, on average, offensively the same (and defensively the same) as shortstops. This is nonsense. First basemen are NOT the same as shortstops. First basemen are better hitters than shortstops; shortstops are better fielders than first basemen. An evaluative system that ranks the average first baseman as being no better hitter than the average shortstop is simply wrong, because the average first baseman IS a better hitter than the average shortstop.
Stating the same thing another way, this “positional segregation” assumes that a first baseman is competing with the league’s other first basemen—which he is not. The runs created by a first baseman and the runs created by a shortstop go into a common pot; they are all competing with one another. In my view, ignoring this difference causes serious distortions in the evaluation of players. Nobody hits as a first baseman. Everybody hits as a hitter.
OK, I still believe that, but I’m a little less dogmatic on the issue now than I was when I was developing Win Shares. The problem with my approach before is that it complicates the evaluation of pitchers as hitters. When you evaluate a pitcher (with no DH rule), you have to include his performance as a hitter, since “hitting by pitchers” is about 3% of the game. When you evaluate hitters as pitchers, they almost all have losing records, typically 0-3 in a season, but not infrequently 0-5 or 0-6.
This imbalances the overall evaluation of pitchers, so that—assuming that an average pitcher is making a .500 contribution as a pitcher—we wind up with total won-lost records for pitchers that skew strongly negative. Three pitchers, with pitching records of 17-11, 12-14, 8-12, overall 37-37. Add in 0-4, 0-3 and 0-3 for their performance as hitters, they’re 17-15, 12-17, 8-15, they’re 37-47. The average pitcher is a .440 player because he can’t hit.
How do you deal with that? You “re-balance” their pitching performance to offset their hitting? That’s an accounting quagmire, and it’s hard to see what you’re really doing, anyway.
I dealt with this before, in the Win Shares system, by moving the Runs Created above or below average by a pitcher into his pitching record, accounting for his hitting as if it was a part of his pitching. That didn’t work, either; that led to the problem of pitching-vs.-hitting values not making sense, which I ultimately had to resolve by assuming that baseball was 52% defense, 48% offense.
And there’s another problem here, which is that pitchers often have zero runs created and not infrequently negative runs created. Even the league average may be zero; if not zero, it is some very small number which acts like a zero. Zeroes and negative numbers are nightmares in analytical systems that have to balance. They cause problems everywhere, because you constantly find yourself dividing by zero, or dividing by a negative number, or dividing by some very small number which creates screwy outcomes.
What I eventually decided to do was to carve hitting by pitchers out from the norm—in other words, everybody hits as a hitter, except pitchers, who hit like pitchers. Otherwise, I compare every hitter to league norms—but pitchers, I compare to other pitchers, and by a different method that dodges the zeroes.
But that means that we have to re-balance the league so that it doesn’t include the pitchers. The “non pitcher adjustment” is:
Runs Created per 27 outs by the league’s non pitchers
-----------------------------------------------------------------
Runs Created per 27 outs by the league including pitchers
Typically, that’s a figure about 1.06 or 1.07 in a non-DH league. It’s 1.00 in a pure DH league, and about 1.002 to 1.005 in a DH league that has some inter-league play mixed in.
So the “contextual norm” for a hitter (other than a pitcher) is:
The league average of runs created per out
Times the Applied Park Factor
Times the league non-pitcher adjustment.
4) State this as a productivity ratio.
The players Runs Created, divided by his outs, divided by the contextual norm as explained above. An average player will be at 1.000.
That was easy.
5) Figure an offensive winning percentage.
Pythagorean-type winning percentages don’t work well on an individual level. Suppose that there is a league in which the average non-pitcher creates 5.00 runs per 27 outs. Suppose that in that league there is a team with these two players:
Player A
|
96 runs created
|
324 outs
|
Player B
|
54 runs created
|
486 outs
|
Taken together, these two players are average. An average player is creating five runs per 27 outs; these two guys have 810 outs, that’s 30 games, 150 runs created.
But if you state their runs created as winning percentages by the Pythagorean method, you have a problem. Assuming that the “Opposition Run” figure for both players is 5.00, Player A would have 8.00 runs created per 27 outs, for a winning percentage of .719; Player B would have 3.00 runs created per 27 outs, for a winning percentage of .265. Assuming we hold each player responsible for one “game” for each 27 outs, Player A would be responsible for 12 games; with a winning percentage of .719 he would be credited with 8.63 wins, 3.37 losses; Player B would be credited with 4.76 wins, 13.24 losses:
Player A
|
12 games
|
.719
|
8.63 – 3.37
|
Player B
|
18 games
|
.265
|
4.76 – 13.24
|
Total
|
30 games
|
|
13.39 – 16.61
|
A .446 winning percentage for the two players, when it should be .500. We’re seriously off course. This is a little bit extreme, but it’s a real and common problem, and switching to some other exponent (other than 2.00) is not going to solve the problem. And there are real-life situations where the problem is worse than this.
There are two elements to this problem:
1) That could hitters make fewer outs than bad hitters, and
2) That the Pythagorean system punishes bad hitters more recklessly than it rewards good ones.
If a player’s ratio of runs created to the league norm is 1 to 1, that’s a .500 player. If it’s 0 to 1, that’s a .000 player. But if it’s 2 to 1, that’s an .800 player. Taking away 1.00 times the normal number of runs moves the hitter 500 points, from .500 to zero, but adding in the same number of runs moves the hitter up by only 300 points, from 500 to 800, so the system no longer balances.
Fortunately, there are lots of other ways to state runs scored, outs and runs allowed as a winning percentage—methods that don’t work as well on the team level, but work better on the individual level.
The method I have chosen to state the productivity ratio (P) as a winning percentage is:
W Pct = (P - .2) / 1.6
If your Productivity ratio is 1.000, this makes a winning percentage of .500 (duh.) It makes this chart:
Productivity Ratio
|
Winning Percentage
|
1.5
|
.813
|
1.4
|
.750
|
1.3
|
.688
|
1.2
|
.625
|
1.1
|
.563
|
1.0
|
.500
|
0.9
|
.438
|
0.8
|
.375
|
0.7
|
.313
|
0.6
|
.250
|
0.5
|
.188
|
Whereas if you use the Pythagorean method, it makes this chart:
Productivity Ratio
|
Winning Percentage
|
Pythagorean
|
1.5
|
.813
|
.692
|
1.4
|
.750
|
.662
|
1.3
|
.688
|
.628
|
1.2
|
.625
|
.590
|
1.1
|
.563
|
.548
|
1.0
|
.500
|
.500
|
0.9
|
.438
|
.448
|
0.8
|
.375
|
.390
|
0.7
|
.313
|
.328
|
0.6
|
.250
|
.265
|
0.5
|
.188
|
.200
|
Most real players have productivity ratios between .75 and 1.25, and the won-lost computations are similar within that range. But for those exceptional players with productivity ratios in excess of 1.25, they’re different. Taking us back to the case above,
Player A
|
96 runs created
|
324 outs
|
Player B
|
54 runs created
|
486 outs
|
The winning percentages we would assign them would be:
Player A
|
.875
|
Player B
|
.250
|
When we assign Player A 12 games (324/27) and Player B 18 games, we wind up with these wins and losses:
Player A
|
.875
|
10.50 wins
|
1.50 losses
|
Player B
|
.250
|
4.50 wins
|
13.50 losses
|
Add them together, they’re 15-15, .500. The method works not as well for teams as teams, but much better for teams as collections of individual players.
6) Assign the player a number of games for which he is responsible.
In the illustration above I used one game for every 27 outs, but in the Win Shares/Loss Shares system I actually assign one “game” for approximately every 18 outs. What I actually do is:
1. Take the Outs made by the hitter,
2. Divide by the Outs made by the team (on offense),
3. Multiply by 1.5 times the (Wins + Losses) of the team.
In other words, since there are 3 Win Shares for each Win and 3 Loss Shares for each loss, and since 50% of those are assigned to the offense, there are 1.50 Win Shares and Loss Shares for each decision of the team, and those are assigned to the hitters in proportion to the number of outs they made.
7) Multiple the offensive winning percentage times the games for which he is responsible.
Which I think requires no further explanation.
For illustration, I’ll follow these steps as I compare the career Win Shares and Loss Shares of two great 1980s first basemen.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
5. Cecil Cooper vs. Don Mattingly
Who was a better player: Cecil Cooper or Don Mattingly? We tend to assume that Mattingly was better because he has a Hall of Fame support group, but it’s not that clear. They’re similar players—both 1980s first basemen who hit for very good averages. Both players, in fact, had career-best batting averages of .352—Cooper in 1980 (.352 with 25 homers, 122 RBI), Mattingly in 1986 (.352 with 31 homers, 113 RBI). Both players had similar power—Mattingly hitting 31 to 35 homers three times, Cooper hitting 30 and 32 homers. Cooper drove in 100 runs four times and 120+ three times; Mattingly drove in 100 runs five times, but drove in 120+ only once. Each player scored 100 runs twice. They had careers of similar length—1785 games for Mattingly, 1896 for Cooper. Neither player walked very much. Both players were left-handed hitting and left-handed throwing first basemen. Mattingly had 200 hits three times; Cooper had 200 hits three times. Cooper got a chance to manage, beginning in 2007, and has done remarkably well with the talent he has to work with; Mattingly almost got a chance to manage the Yankees but was turned down. Mattingly’s OPS is a little higher (.830 to .803), but it’s not obvious which one was a better player.
Let’s start by comparing their .352 seasons:
YEAR
|
Player
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
AVG
|
SLG
|
OBA
|
OPS
|
1980
|
Cooper
|
153
|
622
|
96
|
219
|
33
|
4
|
25
|
122
|
39
|
42
|
.352
|
.539
|
.387
|
.926
|
1986
|
Mattingly
|
162
|
677
|
117
|
238
|
53
|
2
|
31
|
113
|
53
|
35
|
.352
|
.573
|
.394
|
.967
|
And the small stuff, which you will need if you actually want to follow along with all the math:
YEAR
|
Player
|
IBB
|
HBP
|
SAC
|
SF
|
GIDP
|
SB
|
CS
|
1980
|
Cooper
|
15
|
2
|
7
|
8
|
16
|
17
|
6
|
1986
|
Mattingly
|
11
|
1
|
1
|
10
|
17
|
0
|
0
|
That works out to 121 Runs Created for Cooper, 140 for Mattingly:
Player
|
RCA
|
RCB
|
RCC
|
RC
|
Cooper
|
238
|
359
|
678
|
121
|
Mattingly
|
275
|
399
|
742
|
140
|
But Mattingly also has a few more outs, 467 to 440. Cooper’s Runs Created rate is 121/440, Mattingly’s is 140/467.
Then we look at the context. The American League scored a few more runs in 1986 than in 1980:
YEAR
|
Player
|
Lg R
|
Lg Outs
|
1980
|
Cooper
|
10,201
|
61,309
|
1986
|
Mattingly
|
10,447
|
60,962
|
And also Yankee Stadium in 1986 seemed to be a better hitter’s park than Milwaukee County Stadium in 1980—Applied Park Factor of 1.029 for Yankee Stadium, .935 for Milwaukee. We are now ready to state the Productivity Ratio for each player:
Cooper (121/440) divided by (10201/61309) * .935 = 1.773
Mattingly (140/467) divided by (10447/60962) * 1.029 = 1.702
Cooper was 77.3% more effective than an average hitter in the American League in 1980; Mattingly was 70.2% more effective than an average hitter in the American League in 1986. (You have to save a few more decimals than I have shown as your working through this.)
Anyway, we state the Productivity as a Winning Percentage by subtracting .200 and dividing by 1.600:
Cooper
|
1.773 - .200 = 1.573 divided by 1.600 =
|
.983
|
Mattingly
|
1.702 - .200 = 1.502 divided by 1.600 =
|
.939
|
.983 winning percentage for Cooper, .939 for Mattingly. We assign each player a number of games for which he is responsible based on the player’s outs made, the team’s outs made, and the decisions of the team, which are 162 in each case. We multiply the decisions times 1.500 (there are three Win Shares and Loss Shares for each win and loss. One-half of those are assigned to the offense, so that’s 1.500 Win and Loss Shares per decision, or 243 for the season.) The player’s responsibility for those depends on the percentage of the team’s outs that he has made.
Cooper
|
440 of 4361 = .10089 times 243 =
|
24.52
|
Mattingly
|
467 of 4330 = .10785 times 243 =
|
26.21
|
Cooper thus has a .983 winning percentage for 24.52 decisions, or an offensive won-lost record of 24-0 (actually, 24.097 – 0.420), while Mattingly has an offensive won-lost record of 25-2 (actually, 24.600 – 1.608).
We’ll add in the fielding later, and I’ll explain how to do that on some later Monday.
Both players came to the majors late in the season when they were 21 years old:
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1971
|
Cooper
|
21
|
14
|
42
|
0
|
3
|
.310
|
.452
|
.388
|
.840
|
2
|
1
|
.769
|
1982
|
Mattingly
|
21
|
7
|
12
|
0
|
1
|
.167
|
.167
|
.154
|
.321
|
0
|
1
|
.000
|
Cooper’s career started slowly, however, while Mattingly’s started brilliantly. From ages 22 to 28, Mattingly far outshone Cooper:
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1972
|
Cooper
|
22
|
12
|
17
|
0
|
2
|
.235
|
.294
|
.316
|
.610
|
0
|
1
|
.387
|
1983
|
Mattingly
|
22
|
91
|
279
|
4
|
32
|
.283
|
.409
|
.333
|
.742
|
8
|
8
|
.503
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1973
|
Cooper
|
23
|
30
|
101
|
3
|
11
|
.238
|
.347
|
.284
|
.631
|
2
|
4
|
.335
|
1984
|
Mattingly
|
23
|
153
|
603
|
23
|
110
|
.343
|
.537
|
.381
|
.918
|
27
|
5
|
.851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1974
|
Cooper
|
24
|
121
|
414
|
8
|
43
|
.275
|
.396
|
.327
|
.724
|
11
|
11
|
.490
|
1985
|
Mattingly
|
24
|
159
|
652
|
35
|
145
|
.324
|
.567
|
.371
|
.939
|
28
|
6
|
.820
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1975
|
Cooper
|
25
|
106
|
305
|
14
|
44
|
.311
|
.544
|
.355
|
.899
|
10
|
5
|
.697
|
1986
|
Mattingly
|
25
|
162
|
677
|
31
|
113
|
.352
|
.573
|
.394
|
.967
|
29
|
5
|
.865
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1976
|
Cooper
|
26
|
123
|
451
|
15
|
78
|
.282
|
.457
|
.304
|
.761
|
13
|
11
|
.529
|
1987
|
Mattingly
|
26
|
141
|
569
|
30
|
115
|
.327
|
.559
|
.378
|
.937
|
23
|
7
|
.774
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1977
|
Cooper
|
27
|
160
|
643
|
20
|
78
|
.300
|
.463
|
.326
|
.789
|
19
|
15
|
.560
|
1988
|
Mattingly
|
27
|
144
|
599
|
18
|
88
|
.311
|
.462
|
.353
|
.816
|
21
|
11
|
.661
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1978
|
Cooper
|
28
|
107
|
407
|
13
|
54
|
.312
|
.474
|
.359
|
.833
|
15
|
8
|
.656
|
1989
|
Mattingly
|
28
|
158
|
631
|
23
|
113
|
.303
|
.477
|
.351
|
.828
|
22
|
12
|
.638
|
Mattingly’s 25-2 won-lost record for 1986 becomes 29-5 when we add in his fielding. Mattingly’s career won-lost record at this point was 157-54 (.745), while Cooper’s was just 73-55 (.567).
At age 29, however, Cooper came into his own, while Mattingly’s back problems began to seriously slow him down. For the next five years, from ages 29 to 33, Cooper performed better than Mattingly:
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1979
|
Cooper
|
29
|
150
|
590
|
24
|
106
|
.308
|
.508
|
.364
|
.872
|
22
|
11
|
.672
|
1990
|
Mattingly
|
29
|
102
|
394
|
5
|
42
|
.256
|
.335
|
.308
|
.643
|
9
|
13
|
.418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1980
|
Cooper
|
30
|
153
|
622
|
25
|
122
|
.352
|
.539
|
.387
|
.926
|
27
|
6
|
.826
|
1991
|
Mattingly
|
30
|
152
|
587
|
9
|
68
|
.288
|
.394
|
.339
|
.733
|
15
|
16
|
.483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1981
|
Cooper
|
31
|
106
|
416
|
12
|
60
|
.320
|
.495
|
.363
|
.858
|
16
|
7
|
.713
|
1992
|
Mattingly
|
31
|
157
|
640
|
14
|
86
|
.288
|
.416
|
.327
|
.742
|
18
|
15
|
.542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1982
|
Cooper
|
32
|
155
|
654
|
32
|
121
|
.313
|
.528
|
.342
|
.870
|
26
|
8
|
.758
|
1993
|
Mattingly
|
32
|
134
|
530
|
17
|
86
|
.291
|
.445
|
.364
|
.809
|
18
|
11
|
.632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1983
|
Cooper
|
33
|
160
|
661
|
30
|
126
|
.307
|
.508
|
.341
|
.849
|
24
|
12
|
.673
|
1994
|
Mattingly
|
33
|
97
|
372
|
6
|
51
|
.304
|
.411
|
.397
|
.808
|
14
|
6
|
.702
|
This brought Cooper’s career won-lost record to 188-98 (.656), while Mattingly was at 231-114 (.669). At age 34 both players were sub-.500 performers:
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1984
|
Cooper
|
34
|
148
|
603
|
11
|
67
|
.275
|
.386
|
.307
|
.693
|
15
|
18
|
.450
|
1995
|
Mattingly
|
34
|
128
|
458
|
7
|
49
|
.288
|
.413
|
.341
|
.754
|
12
|
13
|
.472
|
Their batting averages were still OK, but as first basemen without power, neither was doing more than treading water. Mattingly retired at that point, while Cooper played on for three more years:
Year
|
Player
|
Age
|
G
|
AB
|
HR
|
RBI
|
AVG
|
SLG
|
OBA
|
OPS
|
WS
|
LS
|
W Pct
|
1985
|
Cooper
|
35
|
154
|
631
|
16
|
99
|
.293
|
.456
|
.322
|
.779
|
15
|
19
|
.445
|
1986
|
Cooper
|
36
|
134
|
542
|
12
|
75
|
.258
|
.373
|
.310
|
.682
|
10
|
19
|
.358
|
1987
|
Cooper
|
37
|
63
|
250
|
6
|
36
|
.248
|
.372
|
.293
|
.665
|
4
|
9
|
.289
|
In the final analysis, Mattingly’s career rates a substantial edge.
Year
|
Player
|
Age
|
WS
|
LS
|
W Pct
|
|
Year
|
Player
|
Age
|
WS
|
LS
|
W Pct
|
1971
|
Cooper
|
21
|
2
|
1
|
.769
|
|
1982
|
Mattingly
|
21
|
0
|
1
|
.000
|
1972
|
Cooper
|
22
|
0
|
1
|
.387
|
|
1983
|
Mattingly
|
22
|
8
|
8
|
.503
|
1973
|
Cooper
|
23
|
2
|
4
|
.335
|
|
1984
|
Mattingly
|
23
|
27
|
5
|
.851
|
1974
|
Cooper
|
24
|
11
|
11
|
.490
|
|
1985
|
Mattingly
|
24
|
28
|
6
|
.820
|
1975
|
Cooper
|
25
|
10
|
5
|
.697
|
|
1986
|
Mattingly
|
25
|
29
|
5
|
.865
|
1976
|
Cooper
|
26
|
13
|
11
|
.529
|
|
1987
|
Mattingly
|
26
|
23
|
7
|
.774
|
1977
|
Cooper
|
27
|
19
|
15
|
.560
|
|
1988
|
Mattingly
|
27
|
21
|
11
|
.661
|
1978
|
Cooper
|
28
|
15
|
8
|
.656
|
|
1989
|
Mattingly
|
28
|
22
|
12
|
.638
|
1979
|
Cooper
|
29
|
22
|
11
|
.672
|
|
1990
|
Mattingly
|
29
|
9
|
13
|
.418
|
1980
|
Cooper
|
30
|
27
|
6
|
.826
|
|
1991
|
Mattingly
|
30
|
15
|
16
|
.483
|
1981
|
Cooper
|
31
|
16
|
7
|
.713
|
|
1992
|
Mattingly
|
31
|
18
|
15
|
.542
|
1982
|
Cooper
|
32
|
26
|
8
|
.758
|
|
1993
|
Mattingly
|
32
|
18
|
11
|
.632
|
1983
|
Cooper
|
33
|
24
|
12
|
.673
|
|
1994
|
Mattingly
|
33
|
14
|
6
|
.702
|
1984
|
Cooper
|
34
|
15
|
18
|
.450
|
|
1995
|
Mattingly
|
34
|
12
|
13
|
.472
|
1985
|
Cooper
|
35
|
15
|
19
|
.445
|
|
|
|
|
|
|
|
1986
|
Cooper
|
36
|
10
|
19
|
.358
|
|
|
|
|
|
|
|
1987
|
Cooper
|
37
|
4
|
9
|
.289
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
232
|
162
|
.588
|
|
|
|
|
243
|
127
|
.656
|
We have Mattingly 9 ½ games better as a hitter:
Mattingly as a hitter: 202 - 86 .701
Cooper as a hitter: 201 – 104 .660
But 14 games better as a fielder:
Mattingly as a fielder: 41 - 41 .498
Cooper as a fielder: 31 - 59 .343
Mattingly won nine Gold Gloves; Cooper won two. Our system is set up so that first basemen normally have less-than-.500 winning percentages in the field, although Steve Garvey, who we looked at last week, wound up over .500 in the field (64-57, .526).