Reflections on the Use of Wins and Losses in an Analytical System
(This is the second of two articles taken from my longer unpublished article, outlining how to calculate Win Shares and Loss Shares, using modern defensive numbers.)
I wanted to explain about and defend the use of Won-Lost records in the rating of the pitcher. I don’t believe that the other WAR systems use Wins and Losses, at all, in their determination of a pitcher’s value, and I know that many analysts will not believe that it is appropriate to use them. My friend Brian Kenny has been on a campaign to "Kill the Win", and his new book has a chapter entitled "Kill the Save Too."
I think that I am in substantial measure responsible for the antipathy toward the win in the sabermetric community, due to things that I wrote in the 1980s, and I think that, because of those things, I need to explain why I feel that it is appropriate to use Wins, Losses and Saves, not as a major element in seeing most clearly the contribution of an individual player to the success of his team, but as a minor element.
What I could say and often do say in this debate is that we are not in the business of eliminating information, but in the business of creating it. What I could also say, and do say, is that it is the point of the game to Win. In this effort here, what we are really doing is creating Won-Lost records for every player—carefully, systematically justified and deeply researched Wins and Losses, yes, but still Wins and Losses. To help his team win, and avoid losing, is the exact definition of a player’s job. A won-lost record is, I think, the best possible shorthand notation to summarize what a player has done to help his team win.
But those also are shorthand arguments, arguments you make when you do not have the time or patience to deal with the real issues. The argument against the win, in essence, is that it pretends to represent an ultimate truth, but fails to do so. That is true enough; it does so in many cases. Also, pitcher’s Wins and Losses are very poorly designed, carelessly designed. The rules determining who gets the Win in a contest and who gets the Loss were never really thought through; they just happened. Casual decisions about record-keeping have been carried through for a century and more, when it has long since become apparent that the original decision was poorly made.
But this is the real issue. . . well, no, I am not quite ready to get to the real issue. On the other side of the issue, I should say first that the Won-Lost record has inherent virtues not found in other statistics—and important inherent virtues not found in other statistics. The won-lost record automatically balances at .500 in every season.
This is a tremendous asset. In my lifetime we have had seasons when the league ERA was under 3.00; we have had seasons when it was around 5.00. When you look at a pitcher’s career ERA and see 3.50, that could be 400 runs better than the league average in his era; it could be 300 runs worse. You don’t know.
In the 1950s, a two-to-one strikeout to walk ratio was a GREAT ratio. No starting pitcher in the American League had a two-to-one strikeout to walk ratio in 1947, in 1948, in 1949, in 1950, in 1951, in 1953 or in 1954. Now, a two-to-one strikeout to walk ratio is not only below average, it is WELL below average. So when you look at a pitcher’s record and see a two-to-one strikeout to walk ratio, you don’t know what that means. It could be a great ratio; it could be below average.
This is true of every stat, except Wins and Losses. Complete Games, Shutouts, Saves, Hits per Inning, WHIP. . .they are ALL misleading because standards change radically over time—except Wins and Losses.
Yes, of course it is true that the quality of his teams impacts a pitcher’s career won-lost record, and of course it is true that random factors (and the fact that the stat is poorly defined) will cause won-lost records to be deceiving even when comparing teammates. But this is also true: that in looking at a starting pitcher’s CAREER record, the MOST reliable piece of evidence (in the basic record of a pitcher) is his won-lost records. The random effects more or less disappear over the course of 300 decisions or more. The flaws in the game-by-game assignment process are not hugely significant over time, although they are hugely annoying in isolated cases. And most pitchers, over the course of a career, pitch for a more-or-less even balance of good teams and bad teams.
Yes, the won-lost record, even over the course of a career, is not a PERFECT summary of the pitcher’s positive and negative contributions to his team—but it is better than anything else in the pitcher’s basic career record. The distortions in a Won-Lost record, over a career, are clearly less than the time-and-place distortions of Earned Run Average, WHIP, Strikeouts and Walks, and every other statistical category.
Even in a single season, the park-effects distortions of ERA can be larger than the distortions in the Won-Lost record, although, in a single season, it is a fair fight between the two, and the ERA would win the fair fight more often than it loses, but the ERAs of pitchers in Colorado, for example, are in no way a reliable indicator of how well the pitcher has pitched.
So Brian is trying to get rid of what is actually the BEST information in a pitcher’s record (for a starting pitcher, over a substantial career.) That does not seem to me to be wise. It doesn’t seem to me like it is going to work, either; I think that we COULD persuade the baseball community to reform the pitcher’s Won-Lost record, if we could somehow agree to act together on that issue, but we will never persuade them to get rid of Wins and Losses—nor should we be trying to do so.
But I have not yet addressed the real issue. The real issue is, does the won-lost record contain ANY useful information which is not contained anywhere else in the pitcher’s record? We KNOW what the park effects are. We KNOW what the league norms are for ERA, strikeouts and walks. If we got rid of the won-lost record, Brian might argue, we could get replace the information—and better—with information created by the things that modern analysts know.
But can we? I don’t know. The best, most honest answer to the question of whether there is any useful information contained in the Won-Lost record and not found elsewhere in a pitcher’s record is that I don’t know, for certain, and (a) you don’t know, either, I don’t believe, and (b) I am more inclined to believe that there IS information in there which would otherwise be overlooked than that there is not.
I thought the opposite, in the 1970s and the 1980s; I believed the opposite to be true at that time, and I played a role in convincing Brian and many others that this was true. I may be primarily responsible for the belief, in the sabermetric community, that Wins and Losses are a useless artifact of an old way of thinking about pitchers; I am not saying that I am responsible for that, but that I may be. I will accept the blame for that mistake if you feel that I should.
Here is where I went wrong. In the mid-1970s, there was an article published about Clutch Hitting, which concluded that that there was no such thing as an ability to hit in the clutch. It was one of the best sabermetric articles published up to that time, and it was decades ahead of its time in having the courage to take on directly one of the central elements of the baseball community’s understanding of why teams win and lose.
The approach used in that article was to compare performance in clutch situations in two consecutive seasons. In other words, the author looked at every at bat of two consecutive seasons—I believe it was 1969 and 1970, or 1970 and 1971, something like that—and isolated the "clutch" contribution of each player in one season, and then the other. His conclusion was that there was no relationship between the lists in the two seasons. The players who were the best clutch hitters in one season had no tendency—no tendency at all—to be the best clutch hitters in the following season, nor did the players who had been the worst clutch hitters in one season have any tendency at all to fail in the clutch in the next season. The patterns of who was "clutch" were simply random.
If an ability actually exists, the author argued (implicitly or directly, I don’t remember which). . .if an ability actually exists, then it must be persistent to at least some extent. If you look at the players who have speed in one season and those who are slow, you will find that the same players are fast and the same players are slow the next year. If you look at the players who hit for power in one season (or who don’t), you will find that they still hit for power (or they don’t) in the next seasons. This is true of every real ability. If a trait has NO tendency to persist, then it isn’t a real ability; it is just luck.
I was very impressed by this article and by this analytical approach, and I used this method to study many other issues in the late 1970s and the 1980s. One of those issues was whether there was any such thing as an ability to "pitch to the score", thus an ability to "win" the game which was separate from and distinct from an ability to prevent runs from scoring. I concluded that there was no evidence that there was such an ability.
But about 2003, 2004, 2005, I had a terrible realization. That method doesn’t work. It SEEMS like it ought to work, and it will work if the "x factor" that you are trying to isolate is relatively large compared to the dominant patterns in the data, but it doesn’t work at all—at all, at all, at all—when the factor you are trying to isolate is hidden by randomization, and also by other, larger elements in the data. "Randomization" means that some days the offense scores 8 runs; other days they get shut out, you just never know how many runs you will have to work with on a given day. "A larger element in the data" refers to, for example, a pitcher’s ERA; obviously a pitcher’s ERA is a LARGER element in his won-lost record than this "x factor", this ability to allow 3 runs when the team scores 4, so obviously the X factor is not the DOMINANT element of the equation.
I began to sense that this must be true about 2003, but it took me about two years to come to terms with the fact that it actually doesn’t work. It was hard for me, because I had done many, many studies over the years which relied on the assumption that that method WOULD work, that it would isolate an X factor if an X factor existed. I had to come to terms with the fact that I had misled many people on many issues—like Brian Kenny on this issue—because I had relied on a method that doesn’t work.
Let me try to explain why it doesn’t work. A pattern of numbers may exist, and you may be able to see the pattern clearly when there is no interference. Like this:
<br clear="all" style="mso-special-character:line-break;page-break-before:always" />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
9
|
9
|
9
|
9
|
9
|
|
|
|
9
|
9
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
9
|
|
|
9
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
9
|
9
|
9
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
9
|
|
|
|
9
|
|
|
|
9
|
|
|
9
|
|
|
9
|
|
|
|
|
|
|
9
|
|
|
9
|
|
|
|
|
9
|
|
|
|
|
9
|
|
|
9
|
|
9
|
|
|
|
9
|
|
|
|
|
9
|
9
|
|
|
|
9
|
9
|
9
|
9
|
9
|
|
|
|
9
|
9
|
|
|
|
|
|
9
|
|
|
|
|
|
9
|
9
|
|
|
9
|
|
|
|
9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
But the pattern becomes much harder to see when it is surrounded by random data. Like this:
7
|
6
|
6
|
9
|
7
|
5
|
5
|
5
|
7
|
7
|
9
|
9
|
8
|
6
|
5
|
6
|
8
|
7
|
9
|
9
|
6
|
6
|
8
|
7
|
5
|
6
|
7
|
8
|
7
|
9
|
8
|
5
|
9
|
7
|
7
|
8
|
8
|
9
|
6
|
7
|
7
|
5
|
9
|
9
|
7
|
5
|
8
|
9
|
5
|
8
|
5
|
6
|
8
|
9
|
5
|
6
|
7
|
5
|
9
|
7
|
9
|
9
|
9
|
9
|
9
|
8
|
8
|
6
|
9
|
9
|
5
|
9
|
9
|
8
|
9
|
5
|
9
|
7
|
7
|
8
|
9
|
6
|
7
|
9
|
9
|
5
|
9
|
5
|
5
|
5
|
6
|
7
|
9
|
9
|
5
|
8
|
6
|
9
|
9
|
8
|
5
|
9
|
5
|
8
|
7
|
8
|
9
|
8
|
5
|
9
|
8
|
9
|
6
|
9
|
5
|
9
|
6
|
9
|
9
|
9
|
5
|
5
|
6
|
9
|
6
|
9
|
6
|
5
|
6
|
5
|
8
|
9
|
8
|
6
|
8
|
9
|
9
|
7
|
5
|
5
|
9
|
5
|
6
|
6
|
9
|
6
|
9
|
6
|
5
|
6
|
9
|
9
|
5
|
6
|
9
|
8
|
7
|
9
|
9
|
8
|
9
|
8
|
5
|
9
|
9
|
8
|
9
|
9
|
8
|
8
|
9
|
9
|
7
|
8
|
5
|
9
|
7
|
7
|
6
|
9
|
8
|
8
|
7
|
9
|
9
|
6
|
6
|
5
|
8
|
9
|
8
|
8
|
9
|
9
|
5
|
5
|
9
|
6
|
5
|
5
|
6
|
7
|
8
|
9
|
6
|
8
|
9
|
8
|
8
|
9
|
7
|
7
|
5
|
8
|
9
|
9
|
6
|
5
|
9
|
5
|
7
|
9
|
9
|
8
|
9
|
8
|
9
|
6
|
9
|
9
|
9
|
9
|
9
|
9
|
6
|
9
|
5
|
9
|
8
|
8
|
9
|
9
|
9
|
5
|
7
|
7
|
7
|
8
|
9
|
8
|
6
|
5
|
8
|
9
|
5
|
5
|
7
|
9
|
6
|
6
|
5
|
9
|
9
|
9
|
7
|
9
|
5
|
9
|
7
|
5
|
8
|
9
|
9
|
6
|
9
|
6
|
8
|
9
|
9
|
7
|
5
|
9
|
7
|
6
|
8
|
7
|
9
|
9
|
9
|
6
|
9
|
7
|
9
|
5
|
6
|
5
|
9
|
5
|
7
|
7
|
9
|
8
|
5
|
8
|
8
|
6
|
9
|
9
|
7
|
8
|
9
|
5
|
6
|
9
|
6
|
8
|
5
|
7
|
6
|
5
|
9
|
6
|
5
|
9
|
8
|
6
|
9
|
5
|
7
|
5
|
6
|
9
|
6
|
5
|
8
|
9
|
8
|
5
|
6
|
9
|
8
|
9
|
9
|
6
|
5
|
9
|
6
|
9
|
9
|
9
|
8
|
5
|
9
|
9
|
5
|
7
|
9
|
8
|
9
|
9
|
7
|
8
|
9
|
8
|
9
|
9
|
9
|
8
|
6
|
9
|
6
|
7
|
7
|
6
|
9
|
7
|
8
|
9
|
9
|
9
|
5
|
5
|
9
|
9
|
9
|
9
|
5
|
7
|
9
|
7
|
9
|
7
|
8
|
9
|
9
|
7
|
5
|
6
|
9
|
9
|
9
|
9
|
9
|
7
|
9
|
6
|
9
|
9
|
9
|
8
|
8
|
9
|
6
|
9
|
6
|
5
|
6
|
8
|
5
|
9
|
9
|
9
|
9
|
9
|
5
|
8
|
5
|
9
|
6
|
9
|
8
|
5
|
9
|
5
|
9
|
5
|
6
|
6
|
7
|
8
|
8
|
7
|
9
|
9
|
8
|
8
|
9
|
8
|
8
|
6
|
6
|
5
|
5
|
7
|
9
|
6
|
8
|
9
|
8
|
5
|
9
|
6
|
6
|
6
|
6
|
6
|
8
|
8
|
All of the originals "9s" are still there; you just can’t see the pattern anymore because of all of the random numbers. A pattern also becomes more difficult to see when it is in competition with another, more dominant pattern. Like this:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8
|
|
|
|
|
8
|
|
|
8
|
8
|
8
|
8
|
8
|
|
8
|
8
|
8
|
8
|
8
|
|
|
8
|
8
|
8
|
8
|
8
|
|
8
|
8
|
8
|
8
|
8
|
|
8
|
8
|
8
|
|
|
|
|
8
|
|
|
9
|
9
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
9
|
9
|
8
|
9
|
9
|
|
8
|
|
9
|
9
|
|
|
8
|
|
|
8
|
9
|
|
|
8
|
|
9
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
9
|
|
|
9
|
|
8
|
|
|
|
8
|
|
|
8
|
9
|
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
|
|
|
|
|
8
|
|
|
|
8
|
|
|
8
|
9
|
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
|
|
|
|
|
8
|
|
|
|
8
|
|
|
8
|
8
|
8
|
8
|
8
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
8
|
8
|
8
|
8
|
|
8
|
9
|
9
|
8
|
9
|
|
|
8
|
9
|
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
|
|
|
|
|
8
|
8
|
8
|
|
9
|
|
|
8
|
9
|
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
|
|
|
|
|
8
|
|
|
8
|
9
|
|
|
8
|
9
|
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
9
|
|
8
|
|
|
9
|
|
|
|
8
|
|
|
|
8
|
|
|
|
|
|
8
|
|
|
|
8
|
|
|
8
|
|
9
|
|
|
8
|
|
|
9
|
|
8
|
|
|
|
|
9
|
8
|
|
9
|
|
|
|
|
8
|
|
|
|
8
|
9
|
|
|
9
|
|
8
|
|
|
|
8
|
|
|
8
|
|
|
9
|
9
|
8
|
|
|
8
|
8
|
8
|
8
|
8
|
|
|
|
8
|
9
|
|
|
|
|
|
8
|
|
|
|
8
|
8
|
8
|
8
|
8
|
|
8
|
|
|
|
8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When a data pattern is in competition BOTH with a more dominant pattern and with random effects, it becomes totally impossible to find the original pattern. Data patterns can exist, in a mountain of data, which are so well hidden that they can resist thousands of efforts to find them. Patterns exist in nature which are so well hidden that they have resisted thousands of years of attempts to decode them. This is what science is, in a sense: it is an endless effort to find the patterns in nature which have been there for millions of years, waiting for us to find them.
Thus, the fact that we have not yet found convincing evidence of an "X factor" in the ability to Win games—or the ability to hit in the clutch—is not proof that these things don’t exist; they may well exist, we just haven’t looked at the data in exactly the right way yet. To find clutch hitters, using the method that was used in that seminal mid-1970s article—even though it was a very good article—but to find proof of clutch hitting using that method, I believe, would require much more data than exists in all of the history of organized baseball. The fact that we can’t see any such pattern DOES mean, I believe, that these cannot be the dominant patterns in the data. Clutch hitting and an ability to pitch to the score (by pitchers) cannot be dominant patterns, or I believe that we would have found them by now, but they may still be hiding in the data as meaningful but recessive elements.
When I was young, I didn’t understand this. I thought that if I looked for the X factor in every way that I could think of to look for it and did not find it, then it must not be there. As a mature researcher, I have much more respect for the ability of the universe to hide its secrets.
So then, we face the basic question: Is it more reasonable to believe that this "X factor" exists, or that it does not exist?
It seems to me much more reasonable to believe that it does exist than that it does not. An ability to Win Games which is not reflected in the number of runs allowed could exist if the pitcher has an ability to allow 3 runs when he has 4 to work with, allowing 5 runs when he has 8 to work with.. . .it could exist in the ability to pitch to the score, but it could also exist in other places. What I mean primarily is that an ability to Win Games which is not reflected in the number of runs allowed could be hidden in the variability of offensive conditions under which a pitcher must work.
Sometimes a pitcher for the San Francisco Giants must pitch in Colorado, since the Rockies are in his division; other times he gets to pitch in San Diego.
Sometimes he must pitch in 90 degree weather; other times he gets to pitch in 65 degree weather. Many, many more runs are scored in 90 degree heat than are scored in cold weather.
Sometime he must pitch when the wind is blowing out. Other times he gets to pitch when the wind is his friend.
Sometimes he must pitch when the home plate umpire is giving pitchers the edges of the plate; other times he must pitch when the home plate umpire is calling those pitches balls, and the first base umpire won’t call a checked swing a swing.
Sometimes the pitcher gets to pitch when the pitcher’s mound is in the sunlight, and the batter’s box is in the shadows. Other times the light conditions are not an issue.
When these conditions exist, they have an impact on all of the pitcher’s stats—except the wins and losses. But in every set of game conditions, there is one winning team and one losing team. If these game condition variables do not even out, over the course of a season, then there MUST be information about the ability of the pitchers which is contained in the won-lost record, but which is not contained anywhere else in the record.
Traditional baseball analysts asserted, until about 1980, that the won-lost record was what mattered most for a pitcher because the offensive support evened out over time. Once we actually studied that issue, we could easily demonstrate that that was false: offensive support does NOT even out over the course of a season, nor even necessarily over the course of a career, although it is much more even over the course of a career than over the course of a season.
But if we assume that offensive condition variables even out over the course of a season, does this not put us in very much the same position that we have already demonstrated to be false? It seems to me that it does. Suppose that two pitchers on the same team have the same ERA and the same innings pitched, but very different won-lost records. We could call these two pitchers Chris Sale and Jose Quintana, 2015, or we could call them Kyle Lohse and Yovanni Gallardo, 2014, or we could call them Gio Gonzalez and Jordan Zimmerman, 2012, or we could call them Miguel Batista and Jarrod Washburn, 2007, or we could call them Kenny Rogers and Nate Robertson, 2006. (Jose Quintana in 2015 had a 9-10 won-lost record despite a 3.36 ERA in 32 starts. His teammate Chris Sale posted a 3.41 ERA in 31 starts, and also allowed more UN-earned runs than Quintana, but still finished 13-11.) Is it likely that ALL of the difference between them is in luck, and that none of it is due to undocumented variation in offensive conditions, or is it more likely that SOME of the difference is due to undocumented variation in offensive conditions?
To explain that Chris Sale received the support of 3.81 runs per start and Jose Quintana only 3.59 is no help, first because the difference is not large enough to explain the divergence in won-lost records, but also because an undocumented variation in offensive conditions would CAUSE such a difference, just the same as luck would cause it.
Unresolved scientific questions ultimately come down to a test of "What do you believe?" It seems to be more consistent with the nature of the universe to believe that the won-lost records contain SOME useful information about the pitchers, rather than that they contain none, and so I am going to use the records as a small element in these Win Share calculations.