Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

How Reliable are Won-Lost Records, Part I

By Bill James

September 12, 2022

How Reliable are Won-Lost Records?

Part I

This article is a companion piece to the article "A Reliable Batting Average", which was posted here on August 22. "This article" is actually a series of five articles, which will run all week, I hope and assume. The fifth article isn’t finished yet, but I don’t anticipate a problem. But first, a brief essay about (a) Won-Lost Records and (b) Twitter.

In baseball’s primordial soup, a pitcher’s won-lost record was the be-all and end-all of his performance. If a pitcher was 18-14, that was what he WAS, period. ERA was there, but given nothing like the weight attached to won-lost records; strikeouts were seen as interesting but perceived as accoutrements, rather than basic elements. Wins and Losses were a person’s clothes; ERA was like the shoes, and strikeouts and walks were like socks and neckties, earrings and lipstick. Clothes make the man. WHIP and WAR and a thousand things like them didn’t exist. If one pitcher was 18-14 with a strikeout/walk ratio of 270 to 45 and an ERA of 2.20 and another pitcher was 18-14 with a strikeout to walk ratio of 140 to 120 and an ERA of 4.15, the two of them were seen as equals. They were both 18-14. HOW they did it wasn’t important; they got the same results. A Hall of Famer was a pitcher who won 20 games regularly; it was that simple.

What sunk that ship was logic, reason and research. I fired more of the torpedoes than anyone else did. I actively campaigned, throughout the 1980s, to get people to stop relying on won-lost records. There were a series of Cy Young Awards in those years that illustrate the point of the previous paragraph. Steve Stone, entering the 1980 season with a career won-lost record of 78-79, went 25-7. He went 25-7 with an ERA of 3.23 and a strikeout/walk ratio of 149-101, a WHIP of 1.297, but he went 25-7. Mike Norris in the same league and season pitched 34 more innings than Stone, struck out more batters and more batters per inning and walked fewer, had a better ERA (2.53) and a WHIP of 1.048, but he was only 22-9. Stone won the Cy Young Award. A relatively narrow advantage in his won-lost record outweighed ALL of Norris’s better stats—ERA, strikeouts, walks, innings, etc.

Not that Stone was truly bad; I have him as maybe the fifth-best pitcher in the league that season, which is still good. We’ll get to that on Friday. In the years just following there were a series of similarly outrageous Cy Young votes. In 1982 Pete Vuckovich won the American League Cy Young Award. He is probably the worst pitcher ever to win it. He had 105 strikeouts, 102 walks, a 3.34 ERA, gave up more than a hit an inning and was not in the top 10 in the league in innings pitched. Wasn’t close to the top 10. Dave Stieb led the league in innings pitched (64 more than Vuckovich), had more strikeouts and strikeouts per inning, fewer walks, a better ERA, etc., just as Norris had had over Stone. In modern terms, Stieb had 7.6 WAR. Vuckovich had 2.8. But Vuckovich, supported by 5.23 Runs Per Game, went 18-6, while Stieb, supported by 3.92 runs per game, went 17-14. In that era, that was all that counted; Vuckovich got the big award.

The following year an obese drug dealer named Lamarr Hoyt, supported by 5.36 runs a game, won the American League Cy Young Award despite a 3.66 ERA. Stieb led the league in pitcher’s WAR again that season with 7.0; Hoyt was nowhere near the top 10. Fighting against this type of logic, in that era, was like shooting fish in a kitchen aquarium, because the awards were so terrible.

I led the charge against won-lost records, and I had good reasons. But we have reached the point at which the dislike of won-lost records, distrust of won-lost records, is so intense that any mention of them evokes scorn and derision from the back of the room. You can’t even call it ‘distrust’ anymore; it is visceral. It’s animosity, like the relationship between Duke and North Carolina. It has gone beyond reason.

One thing I did not understand in the 1980s; well, actually I never understood this until a few days ago, when I was doing this research. What happened in the 1980s is more unusual than I understood. Pete Vuckovich in 1982 had a .750 winning percentage (18-6), but a true winning percentage of .514. In my data I have 3,905 seasons of 30 or more pitcher starts. There are only 4 seasons in which a pitcher had a winning percentage of .750 or higher with a true winning percentage under .520. Regarding Steve Stone in 1980, he won 25 games (25-7) with a true winning percentage of .564. There’s a 217-point gap between his winning percentage and his true winning percentage. There are only two other pitchers in my data (George Uhle in 1923 and Bob Welch in 1990) who won 25 games with a lower true winning percentage.

On August 23 this year I posted the following tweet:

1980s pitchers who COULD get Hall of Fame consideration include Fernando Valenzuela, Dwight Gooden, Bret Saberhagen, Dave Stieb and Dave Stewart. If one these WAS to be selected, which one would you prefer?

Valenzuela (173-153)

Doc (194-112)

Saberhagen (167-117)

Stieb (168-129)

Somebody replied to this "With all of the pitcher’s stats available, why did you choose to show won-lost records?" I blocked him immediately.

Why would you block somebody for that? I can hear people asking. It was a rude question. I have a very simple policy; if you’re rude to me on Twitter, I block you. I’m not on Twitter to have negative relationships. I’m not on Twitter to argue. My experience is, if somebody is rude to you once, they’ll be rude to you again. Twitter becomes a LOT more pleasant if you just block all the people who are rude. I’d LIKE to be able to explain to people that I regarded the tweet as rude, and please watch your manners, but my experience is that 100% of the time, people in that situation will come out swinging.

Why was that rude? Suppose you are going to a party, and you say to your wife "With a closet full of clothes, why are you wearing that?" You think she’s going to be OK with that question? Or suppose you are in a business meeting, and after the meeting your colleague asks "Of all the questions you could have asked in the meeting, why did you ask that one?" You OK with that? The answer is, "It isn’t up to you. That was my choice; you make yours."

The questioner is implicitly asserting his belief—that won-lost records are not the right statistic to be used to frame the question—and is then trying to force me to defend what I have done. I disagree with his opinion; won-lost records ARE the exactly right statistic to use in that situation. That’s fine; he’s entitled to his opinion, but he has no right to push me into a corner and force me to defend mine. He’s trying to start an argument.

I understand that it is different when you have a normal number of followers, 50 or 100 or 200 or 500. When you have a large number of followers, it becomes counterproductive to try to argue on Twitter. First of all, it’s difficult to explain your position on Twitter because the space is so limited. Having an argument on Twitter is like having a boxing match in a phone booth.

And second, a hundred other people are going to jump into the argument with 50 other positions. If you tweet "Brooks Robinson was the greatest third baseman of all time," somebody else will say "Graig Nettles was better." That’s fine; that’s his opinion. But if you try to argue the point, one guy will jump in and say "Those guys were Boomer favorites. Scott Rolen was way better than either of them." And another guy will say, "Adrian Beltre, man. Adrian could put a brick in his shoe and still make every play Brooks Robinson ever made." And somebody else will talk about Nolan Arenado’s fielding percentage, and somebody else will demand to know why you would say Brooks Robinson was the greatest third baseman ever when Mike Schmidt was so much better with the bat, and somebody else will talk about Robinson being over-hyped after he made those great plays in the 1970 World Series, and somebody will argue that Robinson was overrated because he wasn’t that great at chasing down a pop up over his head, and somebody else will say that great defensive third basemen are just guys who weren’t quick enough to play shortstop. Somebody will say that Gold Gloves are all politics, and it doesn’t mean anything, and somebody will say that they have done the greatest fielding analysis ever done, and the best third baseman ever was Willie Kamm, and somebody will talk about Eddie Mathews being underrated. It becomes impossible to answer all of these different arguments.

It’s not productive. A productive argument moves in a relatively straight line; I make my points, you make yours. It’s impossible to have that on Twitter. A Twitter argument becomes a cluster bomb. Nobody learns anything.

OK, so that’s why I blocked that guy. Another guy replied "Do you believe W-L a viable stat for Hall of Fame consideration?" In a sense, that is like asking "Do you believe the primary system is a viable way to select Presidential candidates?" I don’t like the primary system, but that’s still how they are chosen; it doesn’t work worth a crap, but that’s still the system. Won-Lost records ARE the way that Hall of Fame pitchers are selected; you can like it or not like it, but it is still true. But that’s not the answer I gave him.

The answer I gave him was, "The won-lost record is the easiest way to describe the shape and size or a starting pitcher's career. You don't RELY on it; you don't assume it is an accurate reflection of his value, although it is probably 80% accurate over the course of a career."

The first part of that answer (the won-lost record is the easiest way to describe the shape and size and shape of a starting pitcher’s career". . . .I think that is clearly and obviously true. A won-lost record is a two-syllable, seven-letter word; OK, seven digits, including the hyphen. A seven-letter, two-syllable word like "answers", "believe", "merchant", "raining", "snowing","replied", or "basemen". There is no WORD you can use that tells you as much about a starting pitcher’s career as his won-lost record does, and there is no other statistic you can substitute in there which accomplishes as much, or anything like as much. There is no other way to come remotely close to conveying the same amount of information with two syllables.

But the second half of my response, I got to wondering about that later. Is it reasonable to say that career won-lost records are 80% accurate? How would you determine that?

Just the previous day I had published the article "On the reliability of batting averages." I knew that the method I used to measure the reliability of batting averages over X number of at bats was totally unworkable for pitchers, but. . . there has to be some way to do it, right? I don’t know if 80% is a good number or a bad number, so how would you determine that?

So that’s the second part of the article. I’ll tackle that one starting tomorrow.

COMMENTS (5 Comments, most recent shown first)

RMc
Look, Bill, if I want to see an angry grandpa rant, I'll watch Joe Biden.
5:59 PM Sep 20th

abiggoof
I generally agree with Tango that a career record will likely have a pretty good reliability compared to a single season, but I am curious what impact good or bad teams over a career will make. If most of your career is with a stinker, does it ever even out? If you are Palmer or Ford or Pettitte or Drysdale, what percentage is greatness, what percentage being on a winner — and does that ever even out? Also, is there a difference between a great record on a team that relies on pitching strength versus one that is balanced or slugging? I would imagine being on a so-so hitting team which wins 90 is at least partially positively impactful on a pitcher because they probably have strong relievers and defense.
9:54 AM Sep 13th

bjjp2
Bill wrote an article in August of 2017 where he evaluated all the Cy Young Awards, comparing pitchers’ actual W-L records to “deserved” W-L records. Not sure why something like the same methodology wouldn’t work here. Anyway, among other things, Bill concluded that “Of the 102 awards which have gone to starting pitchers, 72 went to pitchers who either (a) clearly deserved the award, or (b) were close enough to deserving the award that I wouldn’t want to argue the issue. The other 30, in my view, were given to the wrong pitcher because of discrepancies in won-lost luck.”
10:44 PM Sep 12th

tangotiger
A pitcher's career Won-Loss record is a much different thing than a seasonal won-loss record, which I presume we will see with the "reliability" work that Bill will show.

The real problem is folks will see that someone has a 300-200 record, see that it works, and therefore decide that W-L records are reliable, and so, will say that it also works at the seasonal level.

That's really the main issue. There are of course other issues.

More importantly though, we can get to wherever we want to get without relying on the "official" W/L records. We can create our own version.
9:00 PM Sep 12th

How Reliable are Won-Lost Records, Part I

COMMENTS (5 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: