By Bill James

February 25, 2014

**1. Setup**

This is a two-part article. Today I am going to explain the concept and solicit your comments anticipating the results, and tomorrow I will tell you what the results of the study were.

Which pitching stats are most closely connected to value? Suppose that you know two facts about two pitchers: each pitcher’s ERA, and his won-lost record. Suppose that the two disagree; one pitcher has the better won-lost record, but the other has the better ERA. Which one is more likely to be the better pitcher in fact?

Suppose that you know his strikeouts, or his walks. Which would you rather have? Suppose that you know his strikeout to walk ratio, or his WHIP (Walks/Hits per inning pitched.) Which would be better? Suppose that you know his ERA relative to the league, or his WHIP?

I have performed a set of studies to address these issues. This is what I did. I took all pitchers in history who pitched 199.0 to 201.0 innings. It is a total of 239 pitchers. Then I compared each pitcher to each other pitcher, asking three questions:

1) Which pitcher is better in this category?

2) Which pitcher is better in fact? and

3) Do these two agree?

The more often a category agrees with the bottom line, the more closely it is connected to Value.

What is Value? I decided to use Fangraphs WAR as "true value". For purposes of this study, Fangraphs WAR (including batting) is considered to be an absolute and perfect statement of a pitcher’s true value. All other stats are measured by the consistency with which they agree with THIS stat, Fangraphs Wins Above Replacement adjusted to include batting.

The Pitching WAR given by Fangraphs does not include batting. A pitcher’s batting contribution is given on another page, and is stated relative to average, rather than relative to replacement. A pitcher in a DH league is thus at zero—no at bats. 65% of pitchers have batting contributions no larger than positive or negative 0.2.

I studied only pitchers pitching 199.0 to 201.0 innings so that innings pitched would be (essentially) a constant in the comparisons, and we would not be fighting workload differences to make accurate comparisons. Since different sources have different data, I learned in the course of the study that Fangraphs has innings pitched totals, for a handful of pitchers, which would put them outside the parameters of the study. But for purposes of illustration. . .Pedro Martinez, 2002, had obviously the best season ever by a pitcher who finished the season in the range of 199.0 to 201.0 innings. Martinez was 20-4 with a 2.26 ERA, striking out 239 batters and walking 40. Fangraphs credits Martinez with 7.8 WAR, 20% more than any other pitcher in the study. Another Red Sock, Josh Beckett in 2007, is second on the list at 6.5, which is total coincidence; I had no idea, when I designed the study, that two Red Sox of my era would be 1-2 on the list.

Anyhoo, Pedro is first not only in overall WAR, but in many or most of the individual categories. He has the highest strikeout total in the group, the best won-lost record, and leads in several other individual categories. When we compare Pedro’s strikeout rate to that of any other pitcher, then, this makes a yes/no contribution for Pedro of 238-0. Comparing Pedro to each of the other 239 pitchers, Pedro had a higher strikeout rate than each of the other 238 pitchers, and was also a better pitcher in fact than each of the other pitchers. The two measurements always agree. 238 and 0.

This makes Martinez, in a sense, almost irrelevant to the study, since his contribution to the totals is very close to 238 and 0 in almost every category. The people who are most relevant to the study, most instructive in the results, are the pitchers who rank high in one area but low in another. If a pitcher is 10-17 but has a good ERA, or if has a very good WHIP but a very poor strikeout to walk ratio, the study gets more information as to which category is more closely connected to true value.

I checked the "value matches" for twelve categories of performance. Almost alphabetically, those were:

Earned Run Average

Home Runs Allowed Per 9 Innings

Relative ERA (ERA compared to the league ERA)

Runs Allowed Per 9 Innings (Including un-earned runs)

Season Score (My own summary stat, combining a mix of pitching measurements into one number)

Strikeouts Per 9 Innings

Strikeout to Walk Ratio

Strikeout to Walk Margin

Walks Per 9 inning

WHIP (Walks and Hits Per Inning)

Winning Percentage

Won-Lost Record

The won-lost records are scaled in this way: Two wins minus losses, plus winning percentage. . ..so that 15-7 is considered better (23.682) than 14-6 (22.786) but worse than 14-5 (23.737). Strikeout to walk margin is strikeouts minus walks, per 27 outs recorded.

In the initial part of my research, I systematically compared each pitcher to each other pitcher in Value and in the other category. I did this with a spreadsheet that sped up the work, of course, but it was still a tedious process, but yielded a precise result given its assumptions. One category, for example, had a yes/no score of 16,781-10,945, a percentage of .605244. When you make all possible comparisons within a group of 239 players you have a potential total of 28,441 matches, but some of those are always ties, so you never quite get to 28,441.

After doing a few categories, I realized that I could get the exact same results much more quickly by doing a large number of random pitcher-to-pitcher comparisons, so in the middle of the study I switched to a process of doing random matches, rather than doing every possible comparison one time, but doing 100,000+ random matches, so that the measurement was accurate. It was at least five times faster for me to do that; let’s not get into why that was.

Anyway, my purpose today is just to ask you which categories you think will be most important, and which you think will turn out to be least important? Comments invited.

## COMMENTS (29 Comments, most recent shown first)

ultimate777I think its ERA, no brainer. A lousy team can drag a pitcher down. A great team can build you up If you have an era of 2.00 but your team averages one run per 9 innings when you pitcth, where are you regarding W-L. Likewise, if you have an era of 5.00 and your team scores 7.00 for you?

Imagine in 1972 if Steve Carleton was pitching for Reds or the A's?

8:20 AM Mar 12thArrojoI am commenting here without yet having read the followup article with the answers. From the old Bill James Abstracts, I learned two important things relating to this question.

1. K/BB ratio in one season is the single most important indicator of a pitcher's effectiveness in the next season.

2. K/9 innings is the single most important indicator for a pitcher's future longevity in the majors.

Hopefully, I've been remembering those conclusions correctly.

9:31 AM Feb 26thschwarzeI always assumed K/BB was the "best" pitching stat. But I later learned that K/9 is better. Give me the guy who gets 12/K per 9 but may walk 3 per 9 over the control artist who K/BB is 5 but is that high because he walks 1 per 9.

8:54 AM Feb 26thStatsGuruIt would be cool if W-L record surprised us. :-)

Since f-WAR is based on FIP, some combination of the three-true outcome categories should come out on top. While I understand the allure of home runs, there are good pitchers who give up quite a few. They limit the damage due to their great K and BB rates.

By the way, why do we never see K/HR? Anyone with a high ratio here should have a low BA allowed, since the high K rate will mitigate any lack of luck on balls in play, and the low HR rate would mean HR don't add to batting average.

6:08 AM Feb 26thchuckstrikeout margin, with strikeouts/9 not far behind.

12:58 AM Feb 26thcolbycoshAm I the only one pushing all his chips onto pure K/9?

12:25 AM Feb 26thevanecurbGrittiness Quotient (GQ)

Wants the Ball When the Chips are Down (WBWCD)

Value of Intangibles (VOI)

I've heard on several broadcasts that these are the most important measures of a pitcher's value. Surprised that they weren't included in the study.

7:00 PM Feb 25thDaveFlemingWhat can I say, hotstatrat? I like to be an iconoclast.

Really, my guess was based on Rick Reuschel. He's a pitcher who:

-Does very well by the WAR metric,

-Doesn't do so great in most of the the other stats listed (k-rate, K/BB. ERA, ERA+, W-L record), and,

-Was

exceptionallygood at preventing HR's.It's just a hunch. I can't wait for tomorrow!

6:47 PM Feb 25thhotstatratThis should be helpful. From August 2007 in Bill's article on Season Scores:

The pitcher’s season score is the sum of three parts, which are:

Part I—Decisions

10 times wins, plus 3 times saves, minus 5 times losses

Part II—Earned Runs

Earned Runs Saved as compared to a pitcher pitching the same number of innings with an ERA of 5.00

Part III—Strikeouts and Walks

2 times Strikeouts, Minus 3 times Walks, the total divided by 3.

That makes Season Score a very good fit. The W-L record measuring in there might throw it off - make it less than K/9 - W/9. But probably not enough. My revised guess:

season score (the rest in the same order as before)

strikeout to walk margin

strikeout to walk ratio

ERA relative to the league (risky, because it is unrelated)

ERA

WHIP (BABiP probably not too variable over 200 innings)

K/9 (just going by experience here:)

W-L pct. (high K guys tend to be actually better than high win guys)

W-L record (with innings constant, high wins is almost meaningless)

W/9 (these last two are components of fWAR)

HR/9 (but small N)

Wow, Dave Fleming picked my last choice. This will be fun and interesting.

5:09 PM Feb 25th78smanI would say that the 3 most important in order are:

strikeout to walk ratio

strikeout to walk margin

HR per 9 innings

4:57 PM Feb 25thsteve161All the while I was reading this article I had this nagging feeling I'd read something like it before. I was away from home afterwards for a few hours and couldn't get it out of my head. Finally I remembered this from the Reader Posts, starting 11 Nov 13:

boards.billjamesonline.com/showthread.php?3548-what-won-in-the-postseason-part-one-1920-68

Here chuck looks at a number of metrics to see which ones correlate best with success in postseason series. There is also a second part posted a couple of weeks later at:

boards.billjamesonline.com/showthread.php?3571-what-won-in-the-postseason-part-two-1920-1993

chuck is looking only at the postseason. He's also examining both pitching and hitting metrics. As he himself notes, the sample size

is probably too small to draw any real conclusions, but there are some fascinating tidbits of information.

Meanwhile I can hardly wait for tomorrow.

4:11 PM Feb 25thOldBackstopAllright, I've calmed down. I'd say:

Season Score (maybe Bill's being a homer)

Strikeouts Per 9 Innings

Winning Percentage

Relative ERA (ERA compared to the league ERA)

Earned Run Average

Won-Lost Record

Runs Allowed Per 9 Innings (Including un-earned runs)

Home Runs Allowed Per 9 Innings

WHIP (Walks and Hits Per Inning)

Strikeout to Walk Ratio

Strikeout to Walk Margin

Walks Per 9 inning

3:23 PM Feb 25thDaveFlemingI'm going to go with....

Home Runs Allowed Per 9 Innings.

Just a hunch. Home runs allowed are the worst outcome in the pitcher-hitter confrontation, so I'll hazard that that turns out to be the most important.

As far as least important? I'll guess WHIP.

1:53 PM Feb 25thtangotigerbryan is right that since Bill limited it to IP of 199-201, then this basically removes a huge variable.

fWAR is, basically, this:

fWAR = IP/9 * (1.2*LeagueFIP - PlayerFIP) / 10

That "1.2" is more like 1.05 for relief pitchers and 1.28 for starting pitchers. But since Bill controlled for IP, this constant cancels out for all the pitchers, and the IP/9 also cancels out, as does the /10 term.

The only thing left is the LeagueFIP, and LeagueFIP by definition is also LeagueERA. That's really the only wildcard.

As I said, more interesting will be all those metrics cited that don't explictly include K,BB,HR.

12:44 PM Feb 25thOldBackstopAfter you referred to the singular of Red Sox as "Red Sock" I snorted through my nose and swept all the papers off my desk. So no answer, other than I have a feeling your Season Score won't be treated like a red-haired step-child in this one.

11:50 AM Feb 25thmrbryanI thought I was clever thinking it would be strikeout to walk margin, and then I noticed that it is pretty much the overwhelming choice. Anyway, chalk one more up for strikeout to walk margin.

Margin is incredibly effective in this case, I think, because it actually becomes so/inning - bb/inning, with the innings essentially losing their value because they are all between 199 and 201. You're packing a lot of information into that seemingly straightforward margin stat.

11:38 AM Feb 25thhotstatratMy first thought was exactly what Tango suggested - that it will be whatever corresponds closest to how fWAR is calculated. I'll go with this order:

strikeout to walk margin

strikeout to walk ratio

ERA relative to the league (risky, because it is unrelated)

ERA

season score (although, I forgot what is in it)

WHIP (BABiP probably not too variable over 200 innings)

K/9 (just going by experience here:)

W-L pct. (high K guys tend to be actually better than high win guys)

W-L record (with innings constant, high wins is almost meaningless)

W/9 (these last two are components of fWAR)

HR/9 (but small N)

11:24 AM Feb 25thDavid KowalskiERA or won-lost: ERA. I remeber the Orioles grabbing Mike Cuellar from Houston around 25 years ago. Cuellar's won-lost record was so-so. His ERA was very good. Mike Cuellar was a very good pitcher for the Orioles. It answered my naive question, why did they trade for this guy for their rotation.

Strikeouts or walks. Walks. Nolan Ryan wasn't so hot when he had a lot of walks. Walks tend to come in streaks and Ryan would start taking something off his fastball when he had two runners on base. Two walks meant he was going to be battered (this is based on games at Fenway Park, by the way).

k/bb or WHIP. Definitely k/bb. That ratio will identify either the pitcher with better stuff or the pitcher with great control. WHIP doesn't differentiate between a three run homer or a meaningless single. In fact, two meaingless singles are worse than the three run dinger.

ERA relative to the league or WHIP. Era relative to the league. Exhibit A is the national League in 1930 when the league ERA was an awful 4.97. Exhibit B is 1968 when the NL ERA averaged 2.99. This doesn't even go back to the dead ball era. The 1930 Phillies had awful starting pitching but in context it was less awful than it seemed. Of course, it was still awful.

10:58 AM Feb 25thtangotigerFangraphs version of WAR (fWAR) for pitchers is, at its core FIP.

And FIP is:

(13*HR + 3*BB - 2*SO) / IP plus some constant

Therefore, of those listed by Bill, I would expect Strikeout to Walk margin (i.e., SO-BB), to be the one that most closely matches fWAR.

***

Had Bill used the Baseball Reference version of WAR (rWAR), then it would have been Runs Allowed per 9IP, since, at its core, that's the starting point of rWAR.

***

So, the real fun will be to look at those things that fWAR does not use explicitly (like runs allowed and W/L) and see which do bettter.

10:38 AM Feb 25thJackHere's how I'd rank the categories, from most to least important:

1. Season score

2. K/9

3. K-BB margin

4. K:W

5. HR/9

6. W/9

7. WHIP

8. ERA+

9. RA/9

10. ERA

11. WIN%

12. W-L

- Jack

10:25 AM Feb 25thEdwardRelative ERA.

9:52 AM Feb 25thbearbyzStrikeout to walk margin.

9:07 AM Feb 25thcolinbI would guess that season score and relative ERA would be the two best indicators.

8:15 AM Feb 25thmathias2My guess is that strikeout to walk margin will correlate best with value.

8:04 AM Feb 25thstudesRemember that Fangraphs' WAR "neuters" the impact of all fielded balls for pitchers, so that a pitcher who strikes out a lot of batters, doesn't walk them and doesn't give up home runs will do best. The impact of all non-HR batted ball outcomes is equalized among pitchers.

7:58 AM Feb 25thgarywmaloneyEither Runs Allowed per inning or Relative ERA.

7:43 AM Feb 25thCharlesSaegerI'm going with won-lost record since I just think there will be a surprise somewhere.

7:23 AM Feb 25thtaosjohnI'd guess season score will do best, k/9 second best, W-L worst.

7:23 AM Feb 25thgreggborgesonI'm going with strikeout to walk margin. Eager to see part II.

6:16 AM Feb 25th