Defending the Win

May 24, 2014
I’m going to write some articles about wins.
I realize that this makes me a Luddite in the toaster-oven factory, but I’m a bit annoyed at all the shade getting thrown on the humble ‘win’ statistic. Acknowledging all of the flaws with the statistic that make Brian Kenney’s blood boil, I wonder if we’ve gone too far in disregarding wins.
Take Warren Spahn.
(No, seriously: take him. Please. He won’t leave my living room.)
Warren Spahn has 363 wins. That means that there were 363 games where Warren Spahn pitched at least five innings, and left the game with the lead (or saw his teammates pull ahead in the bottom half of the inning), and saw his team win. 
I don’t know about you, but I like knowing that. I don’t credit it with too much: I don’t say that those 363 wins make Warren Spahn the best left-handed pitcher in baseball. I realize that he pitched poorly in a lot of those games, and I realize that he pitched well in games where he gets credited with a loss. I know it’s not the exacting measure of how great a pitcher he was.
But...I don’t know what statistic better captures Warren Spahn.
‘Winning 20 Games’ is another stupid metric. Okay, it’s not a metric, but a concept: Good Pitchers Win Twenty Games.
This concept has given us plenty of bad Cy Young Awards (LaMarr Hoyt, Bob Welch, Mike Flannagan, Mike McCormick, Dwight Gooden). It’s led to some terrible contracts (Barry Zito). It should probably be banished from the lexicon of any reasonable person.
Except when we’re talking about Warren Spahn, because Spahn won twenty games like clockwork. He won twenty like it was his job. He actually thought it was his job: every year he’d try to get to twenty. Most years he did.
Look at this. This is stat-sexy:
Winning twenty games is perhaps meaningless...but it looks great on the back of a baseball card.
Warren Spahn is my grandfather’s trump card: the guy he'll bring up if I get a little too excited about a current player: 
Me: Jon Lester’s looking pretty good tonight.
Him: How many times has Lester won 20 games?
Me: I don’t know. Once? Never? 
Him: (Long pause) Humph.
He doesn’t even have to say it...that ‘humph’ is my grandfather’s implicit nod to Warren Spahn.
It’d be silly for me to say that the ‘win’ statistic is meaningful because my grandfather cites it. My grandfather sometimes eats crushed saltines with milk when he has a cold: I don’t think everyone should necessarily follow suit.
But there are good reasons to defend the Win statistic:
-        Many people have used the statistic to judge pitchers in the past. It’s shaped who is in the Hall-of-Fame, who’s won the big awards. It’s been a part of the broad conversation about pitching for a century. When we talk about Carlton’s 1972 season, we don’t say that he had an xFIP of 2.01, or an fWAR of 12.1. We say that Lefty won 27 games for a really terrible Phillies team.
-        Pitchers care about wins. Every pitcher who has crossed into the 300-win club probably cites those 300 wins first....everyone except Nolan Ryan, who might cite a few other things first.
-        It is not the worst measure of a starting pitcher’s value. There’s an article on this site, written by the gentleman on the letterhead, which suggests a pitcher’s Win-Loss record as a better indicator of a pitcher’s value than ERA, WHIP, or strikeout rate. It’s not a whole lot better than those other metrics, but I haven’t seen a ‘Kill the K/9’ campaign anywhere.
-        Wins can still be useful: the statistic can show us interesting things about pitchers. (This will be the subject of Part II, which will start with Don Sutton.
‘Wins’ are an imperfect statistic. This is a trait they share with every non-counting statistic that has ever existed. Any stat that requires mathematics more complicated than simply counting has a degree of imperfection: ERA, xFIP, WAR, Win Shares, and UZR ain’t perfect either. Even some of the counting stats are stupid: if you want to kill off a statistic, let’s cut off the three heads of the hydra-RBI.
Or runs scored. Let’s kill that stat.
No one ever talks about runs scored in a critical way. Why not? I’m watching the Jays/A’s game as I write this. Just a minute ago, with a runner on first, Brett Lawrie hit a grounder to first base. It was absolutely easy out. Brandon Moss fielded it and threw to second to get the lead runner. Lawrie beat out the subsequent throw to first. Routine event: fielder’s choice. Lawrie basically took the spot that his teammate earned.
The next batter homered. Lawrie was credited for the run scored. Isn’t this a little off? Shouldn’t the batter who actually got to first safely, without costing his team an out, get some of the credit? 
Or let’s kill saves. Saves are a stat that is much worse than wins. If you want to talk about bad award choices, saves blows wins out of the water. Saves have taken Cy Young Awards from Nolan Ryan (1977, maybe 1987), Blyleven and Stieb (1984), Hershier (1987, maybe 1989) Maddux (1989), and Mark Prior (2003). They’ve taken MVP awards from Musial or Jackie (1950), Rickey Henderson (1981), Eddie Murray (1984), and Kirby Puckett (1991). The only relief pitcher who won an award he might’ve deserved was Rollie’s 1981 Cy Young Award.
Worse, saves cause teams to employ dumb strategies. It is a statistic that is so important to salary demands that it influenced, for a few decades, when a manager used their best bullpen arm. This is eroding slightly, but it’s a significant mark against the save. Head-in-the-sand talking heads like to rant about stats ruining baseball: the save is a statistic that actually did hurt the game.
Or let’s not kill any stats. Let’s trust that people have the capacity to juggle multiple imperfect statistics, and place value on them accordingly. This is actually happening, incidentally. Even casual fans don’t treat the ‘W’ as the benchmark of pitching greatness, just like casual fans know that the save is sort of fluky.
We should remember that Felix Hernadez (13-12) won a Cy Young Award that saw CC Sabathia (21-7) finish third in the vote. It wasn’t a close vote: Felix received 21 of 28 first-place votes. No one raised a stink about this: there wasn’t a riot in New York. We should remember, too, that we haven’t seen a relief pitcher win a major award since 2003.
I honestly think we’re ahead with wins and saves than we are with RBI. Joey Votto is still getting grief for not having enough RBI’s, and Miguel Cabrera has two MVP trophies because he dominates the Triple Crown catagories. The same isn’t true for pitchers: Kershaw didn’t lead the NL in wins in 2013, or come close to 20 victories...and he received 29 of the 30 first-place votes. Jose Fernandez finished 3rd in the vote, with 12 wins. Matt Harvey finished 4th...he had 9 wins. The AL Cy Young winner led the league in wins, but the guys who finished 2nd, 3rd, 4th, and 5th in the voting had win totals of 13 (Darvish), 14 (Iwakuma), 14 (Sanchez), and 11 (Sale). Bartolo Colon had 18 wins and finished behind’re telling me we’re overrating wins?
Jeff Samardzija has zero wins this year. He also has a 1.46 ERA and a 2.87 FIP. Very few Cubs fans are blaming Samardzija for the team’s record, and when the trade deadline comes up the Cubs are going to get plenty of offers on the luckless right-hander. The masses are no longer beholden to the ‘win’ statistic. That doesn’t mean it isn’t useful.
Dave Fleming will post his second article on Wins in a few days, before returning to his WAR-citing, xFIP loving habits of old. He welcomes comments, questions, and suggestions here and at

COMMENTS (33 Comments, most recent shown first)

For Kid8: No.
5:02 PM Jan 13th
For Kid8: No.
5:02 PM Jan 13th
Hello Dave,

Were you serious about Dwight Gooden being a bad CY choice?

1:06 AM Aug 14th
I think that perhaps the main reason that the Cy Young voting seems to be ahead of the MVP voting is that word - valuable. Some voters will insist that a player can't be that "valuable" if their team had a losing record, or even if their team didn't make the playoffs. There is no such implicit handicap in Cy Young voting. I think everyone knows that the Cy Young award is supposed to go to the best pitcher, whereas some people think that there is a difference between the best player and the most valuable player. So, I'm only cautiously optimistic about future MVP votes, as voters seem like they are looking at more than just the triple crown stats these days, but those (Along with team W-L record) will probably still dominate MVP voting more often than not for a few more years to come.
9:05 AM Jun 5th
I know it's difficult to communicate sarcasm on the page, but I figured the italics around 'Gooden' would be enough of a clue.
3:44 PM Jun 2nd
Seconding bjjp's motion re: Gooden. Who else ought to have won?

"...She says, not having read more recent articles yet and possibly missing an answer therein."
7:30 AM May 29th
Wow. Next thing you know, someone will be claiming batting average actually means something...

Great article. The fact it even needed to be written proves the old adage: common sense ain't that common.
11:53 AM May 27th
One minor quibble about this article: The saves statistic wasn't used as far back as 1950 so that there is no way it affected the results of the NL MVP award balloting in 1950. I'm not saying that the most valuable player won but just saying that one can't blame that result on the effects of the saves statistic.
7:38 AM May 27th
re: Warren Spahn. A measure of those 21 seasons is that eight times he led the league with them, and nine times he led in complete games.

However on inning count it isn't really fair to use innings per year with the more recent pot-war SP usage.

My new kick is Jamie Moyer, what a freak, 105 Ws after age 40. Moyer, for instance, averaged 30 GS over his best 17 year stretch. Spahn averaged 34 for his core 17 year. You pitch 35 you lead the league lately, of course..

Jamie Moyer best 17 year span 212-131, avg 30 QS percentage something like 56 to 63 for Spahn.
12:29 PM May 26th
@Charlie: that was a good post. Even if WAR or FIP are much, much better standards/stats, the average Joe baseball fan doesn't know outstanding to disastrous, and the subtle shades in between that make the conversation, largely because of decades of chatter about those. That put it in a different light for me.
11:35 AM May 26th
I don't get the Dwight Gooden reference in this article. He won one Cy and it was clearly deserved. ​
9:25 PM May 25th
Warren Spahn's record from 1947-1963 was 342-211. As another reader pointed out, he pitched at least 245 innings in each of those 17 seasons, he couldn't find anyone in the post war era who has done that. I can't find anyone either. Gaylord Perry is the closest.

Gaylord Perry's record from 1964-1980 was 285-223. He did not pitch 245 innings in 1964, 1965, 1977, or 1980, but his total innings were similar to Spahn's. Perry averaged 275 innings per year from 1964-1980; Spahn was at 278. Spahn likely would have averaged more than 278 innings if he had been playing a 162 game schedule, so his durability advantage over Perry is higher than 3 innings per year if you adjust for length of schedule.

So, within the context of this article, why did Perry win 57 games fewer than Spahn over his busiest 17 year peak?

Perry's ERA+ was 123; Spahn's was 127. That's worth a few wins; I don't know how many, but it doesn't seem like it would be more than one per year. Spahn also had 62 shutouts vs. 50 for Perry, and I think that probably accounts for a few more wins.

So you've got Spahn with an ERA+ that is 3% better. He pitches 3 more innings per year, and has 12 more shutouts in 17 seasons. We could also throw in that he was a better hitter than Perry, but that's not fair because there were 6 seasons in Perry's span where the DH was used. Spahn may have been a better hitter than Perry, but he wasn't a better hitter than Rico Carty or Larry Parrish or whoever was DH'ing for Cleveland and Texas in those years.

So the biggest difference between Perry and Spahn has to be the teams that they played on. So Dave, next time you see your grandfather, try to make the case that Gaylord Perry and Warren Spahn were comparable pitchers. It might be fun.
12:02 PM May 25th
@tangotiger: and it gives us a built-in understanding. We know that a 12-12 pitcher put up some good bulk, but his results were just league average. He might have been on a good team and didn't pitch well, he might have pitched well for a bad team, he might really have been average, he might have had some good luck or bad luck. But we know the results, intuitively: the results are average.

We know the spread of wins, more or less, that winning 20 isn't common and has a tendency to be something that happens to good pitchers. This isn't a feature of automatic centering, but a feature of loads and loads of chatter throughout the years.

Incidentally, batting average has the fact that we know the spread intuitively as well. Most years, average is about .260, and .300 is something special. It doesn't have the automatic centering feature, which hurts us when dealing with 1930 (when the NL hit .303, the pitchers too) and in 1968 (when the NL hit .243). But most of the time, I know what hitting .297 means and it's pretty good, even if it doesn't do so well at contract time.

Same story with RBIs, another flawed statistic: we know that a good hitter should have 100 or more. There are a million reasons why that isn't always so, but when we talk about Hank Greenberg in 1935, with what do we start? He drove in 170 runs. The rest of his statistics support someone who, playing in a hitters' park and with a good hitting team, would drive in 170 runs.

Or, in short, this is an asset of the traditional statistics. We've all talked about them so many times that we know what's an average value and what's a good value. Winning percentage for pitchers has a boost over the other ones since it has the automatic centering feature.
11:42 AM May 25th
Tango, I agree that the 3-2-1 isn't perfect or even solid, but I meant it was a reasonable guideline for enhancement for those who:

-- need brevity (like anyone using a plethora of examples without charts)


-- want/need to use the W/L to speak to less sophisticated traditionalists (most of the reading publc)

Those numbers qualify the W/Ls by: confirming an unfair Loss to Wheeler; raising a questionable W/L season to Trachsel, and presenting a W/L that needs no qualification with Ford.

Replacing them with FIP, ZWAP and ZOOM would go over 99.99 percent of the reading public, i.e.: Bill's book market.

10:36 AM May 25th
I think of wins as primarily a narrative stat rather than an evaluative one, and still useful in that regard.
9:26 AM May 25th
And also that if you used ERA relative to league average, that it was higher than win%.

The key point is that win% always has an average .500 in any year, league. That's its virtue, that it has its automatic built-in index.

7:07 AM May 25th
It's worth reminding ourselves that when Bill wrote that piece about pitching stats (was it back in February?) the number that rated surprisingly high was winning percentage, not raw wins and losses.
2:30 AM May 25th
Wins have diminished as an indicator of fine pitching over the years with fewer innings pitched by starters. But through the 60s and into the 70s, they were obviously more important than now, since so many starting pitchers pitched so far deeper into games.
Guys in those years with 15-20 wins and 10 complete games should have been proud of their win totals. Most were earned and what's the adage: You're only as good as that day's starting pitcher.
But even today with bullpens playing a more prominent role, wins are an indicator.
But...maybe someone should track games where the pitcher really did earn a win. Look at the game for what the starter did, what was his impact on a victory.
Or, in the win-loss category, put in the record of the pitcher's team in games he starts.
12:03 AM May 25th
"Wheeler: L, 6.2 IP, 2 ER"

The L tells us virtually zero about Wheeler. It tells us about his circumstances (bad offensive support and/or bad bullpen support), but not about him. The 2 ER tells us something about Wheeler, but it could also be his fielders. The 6.2 tells us something about him as well.

"Trachsel: 15-8, 4.97 ERA"

Again, the W-L tells us almost nothing about Wheeler, IF we already have his ERA. It *could* tell us that he played his hitter's parks or in a season that was a high scoring environment.

But, we don't need to infer that, since we can already calculate that. You can toss the 15-8 AND the 4.97 out the window, and ask for the ERA- and IP instead. You'll get better information if you want two stats.

"Whitey Ford 236-106"

At the career level, a good amount of noise is reduced. Not enough. And as I said, we don't need to infer something if we can already calculate it. If you are limited to two numbers, instead of W and L, then limit yourself to ERA- and IP. Naturally, you don't need to limit yourself.

And as I keep saying, once you have ERA-, FIP-, and IP, what is W and L going to give you? Nothing other than a tie-breaker.

8:57 PM May 24th
""That said, the career W, RBI, R, etc leader boards are packed with HOF'ers.""

Also, teammates of Frankie Frisch.

I think that longevity is a legitimate factor to toss into our usual excellence discussion for HOF (Jaimie Moyer, anyone?)
8:34 PM May 24th
Good stuff Tom. Always appreciate very much the measured tone, as well as the industry-leading analysis.


As we all know, practically the FIRST thing :- ) Bill did, was to kick over the idea that W's are the BEST way to evaluate a pitcher. I remember some comment where he poked fun at (Jack Morris'?) "knowledge of how to win," saying, "Give Bill Krueger or Bill Wegman six runs a game, and they'll have this knowledge, too…"


That said, the career W, RBI, R, etc leader boards are packed with HOF'ers. Tom addresses this in part in this thread, I think.

I would gingerly go a bit further: Show me a 17-game winner from last year, and *generally* I'll show you a very good pitcher there, too. What, 80% of the time? Last year was Scherzer, Wainwright, CJ Wilson … year before, Gio, Price, Dickey, Jared…


Naturally, xFIP contains more *predictive* value than W. But (1) the record of what they DID done is important, too, and

(2) There may be information contained in the W statistic that is beyond our current perception. I certainly know that Brandon Maurer doesn't know what to do with a 3-0 lead :- )

7:47 PM May 24th
How is this? If you want to use W/L, use three stats to describe an outing, two stats to describe a season, and it can stand alone on a lengthy career.

Wheeler: L, 6.2 IP, 2 ER
Trachsel: 15-8, 4.97 ERA
Whitey Ford 236-106

7:41 PM May 24th
You're making stuff up about the pomposity. Rather than arguing the merits of the topic, you instead argue about a sentence. If I have to be careful with every sentence I write to make sure that someone isn't offended, I won't write anything.

Look at Bill. He said something about being out of our minds in how some of us treat the W/L record. I ignore that statement. Otherwise, what's the point of arguing how he's delivering each statement?

Stick to the merits.
4:25 PM May 24th
"we cannot trust the users to use the stats properly" This statement is so littered with kind of pomposity that is causing a backlash against the new SABR crowd I don't know where to begin with it. HUBRIS is becoming such a huge problem in this country it makes me laugh how full of it some people are.​
3:20 PM May 24th
Thanks Dave. Nice article - I'm looking forward to part II.
2:04 PM May 24th
I have my usual mantra locked and loaded (look at everything, people) of course, but I have something else to ask you guys: Why, in this age of statistical overload, do we not use OPS for pitchers? Some people use it for hitters as a thumb-rule stat; why hasn't it caught on for pitchers? In some ways it might be an even more accurate stat for them; the issues with lead-off men and OPS don't apply, since they face all nine spots in the order. Anyone?

Nice stuff as usual, Dave, but I miss the soliloquy.
1:33 PM May 24th
[quote]Who is a better pitcher, one that goes 20-10 or one that goes 10-20? Odds are, it's the one that went 20-10, isn't it? Who is the better pitcher, one that has a 3.00 ERA in 200 innings or one with a 4.00 ERA in 200 innings? Odds are, it's the one with a 3.00 ERA. [/quote]

Yes, absent all other information, you are correct. But, we are not lacking that information. It's not like we're looking at some high school kids and all we have are their W/L records.

We have IP, ER, K, BB, H, HR, the team-level, park-level, league-level data. We have all that.
11:48 AM May 24th
I'm talking about comparing ERA and W/L record of pitchers on the same team in the same season.

All these other examples, the career-level, the comparing 2006 to 1906, etc, is a different argument.
11:06 AM May 24th

Who is a better pitcher, one that goes 20-10 or one that goes 10-20? Odds are, it's the one that went 20-10, isn't it? Who is the better pitcher, one that has a 3.00 ERA in 200 innings or one with a 4.00 ERA in 200 innings? Odds are, it's the one with a 3.00 ERA.

The problem of course is that the pitcher with a 3.00 ERA might have done that in 1906 in a pitchers park while the 4.00 was done in a hitters park in 2000. We know we have to adjust for Park, Era, Unearned Runs and a host of other things. It doesn't mean we should throw out ERA because it's grossly flawed.

The same things goes to my way of thinking about Wins. There are things that one needs to adjust. Wins is not useless; one just needs to know what adjustments to make.

10:48 AM May 24th
Dave picked a very strong example to make his case. Warren Spahn was a very great pitcher of a particular type--perhaps the greatest pitcher of that particular type. The characteristics of that type are:

1. He had a few truly dominant seasons. In Spahn's case, looking at his WAA in baseball-reference (which probably undervalue his best seasons a bit), he was between 3.9 ad 6.6 in 1947 and 1951-3. That is one of the most dominant records of his entire generation, the GI generation, born 1903-24, although players in other generations have had considerably more such seasons than he.

2. He was an extraordinarily durable pitcher, one of the most durable in history, He pitched 245 or more innings every year from 1947 through 1963. That could well be a record, although I haven't researched it. And he remained a game or two better than average for the whole of that time.

3. He spent most of his career in two very good pitcher's parks, Braves Field and Milwaukee County Stadium, which helped his ERA.

4. He had the very good fortune, for a pitcher, to count Henry Aaron, Eddie Matthews, Joe Adcock, Red Schoendienst, Wes Covington, and several other good hitters among his teammates. He was also an above average hitting pitcher himself.

All that combined to allow him to win 363 games.

The most difficult thing to evaluate about such a pitcher, I think, is the durability factor, the ability to pitch 250 innings a year for many years at a slightly above average level. That kind of pitcher is indeed, I think, very valuable to a team with significantly above-average offense, but he can't be your MVP..

But to see how misleading wins can be, let's take a teammate of Spahn's, Lew Burdette. Now no one has ever said he was in Spahn's class and he hasn't gotten much HOF support but the perception was that he was a very good pitcher.

Burdette, as it turns out, was only 2 WAA once, and he was only over 1 WAA three times--and all those seasons were before 1957. His figures for 1957-9, when the Braves either won or tied for the pennant, were -1.3, .7, and -1.5 WAA,. During those three seasons, he won 17, 20, and 21 games. I cannot escape the conclusion that the Braves' offense and luck played the predominant role in running up those figures.

Of course, Burdette's reputation was enhanced when he beat a relatively weak Yankee team three times in the 1957 World Series. But in the 1958 series he gave up 5 runs in his first start (which he won), 7 runs in his second, and 6 runs in his third. The Yankees had figured him out in 1958.

It is true, but sad, that wins play a huge role in deciding who gets into the Hall of Fame. That's why Dave Stieb, for instance, who was more valuable in his four or five best seasons that Warren Spahn was, will never get anywhere near the Hall. That's also why so many of the pitchers in the Hall played for highly successful teams for most of their careers. Now of course, anyone who won 300 games was a great pitcher, but I think there are a number of 200 game winners who were never great enough to be in the Hall. And I think that wins are a very misleading statistic for many pitchers who are only average, or a little above average, and who happen to be in the right place and the right time.

10:20 AM May 24th
Dave and Tom:

We forget sometimes that statistics are used by many different people for many different things. W-L record has affected award selections, including both the HOF and CY Young, but that's a relatively minor issue. The real value in advanced statistics such as FIP, WAR, game scores, and others, is in their usefulness as tools to make decisions on which players to pay, which ones to keep, which ones use and how to use them, etc. These decisions affect the outcome of games, and are better measures of a player's value than W-L. Wins and W-L records are easy to track, easy to state, and have been in use for a long long time. Fans like these measures. BBWAA members like these measures. No harm, no foul. The potential harm comes when player personnel directors, GMs, and owners use these stats as measures of value.
10:16 AM May 24th
As for the imperfect argument: the more imperfect, the less the weight. The seasonal pitcher W/L is so imperfect it deserves virtually no weight.

Given a pitcher's rWAR, fWAR, and W/L record, the W/L record would be given so little weight that we'd use in a tie-breaker.

At the career level, the W/L record's imperfections get reduced somewhat. But, the main argument against the W/L record is at the seasonal level.
8:56 AM May 24th
There is practically zero chance that Shark is going to finish in the top 5 in Cy Young voting if he leads the league in ERA and has a 0-10 W/L record.

Cliff Lee was 6-9 and Cole Hamels was 17-6, pitching for the same team, with the same offense, same bullpen, same park, in 2012.

In the same year, Peavy was 11-12 and Sale was 17-8, with performances that were similar.

Of course Lee and Peavy were going to get zero consideration for the Cy, and neither got any votes. Hamel got 1 and Sale got several votes.

In 2010, Felix got the love, but look at Jered Weaver, with the same 13-12.

The problem is that the SEASONAL pitcher W still gets too much love. That's the problem. We cannot trust the users to use the stats properly.

At the career level, sample size washes away alot of problems.

8:40 AM May 24th
©2021 Be Jolly, Inc. All Rights Reserved.|Web site design and development by|Terms & Conditions|Privacy Policy