1. Setup
This is a two-part article. Today I am going to explain the concept and solicit your comments anticipating the results, and tomorrow I will tell you what the results of the study were.
Which pitching stats are most closely connected to value? Suppose that you know two facts about two pitchers: each pitcher’s ERA, and his won-lost record. Suppose that the two disagree; one pitcher has the better won-lost record, but the other has the better ERA. Which one is more likely to be the better pitcher in fact?
Suppose that you know his strikeouts, or his walks. Which would you rather have? Suppose that you know his strikeout to walk ratio, or his WHIP (Walks/Hits per inning pitched.) Which would be better? Suppose that you know his ERA relative to the league, or his WHIP?
I have performed a set of studies to address these issues. This is what I did. I took all pitchers in history who pitched 199.0 to 201.0 innings. It is a total of 239 pitchers. Then I compared each pitcher to each other pitcher, asking three questions:
1) Which pitcher is better in this category?
2) Which pitcher is better in fact? and
3) Do these two agree?
The more often a category agrees with the bottom line, the more closely it is connected to Value.
What is Value? I decided to use Fangraphs WAR as "true value". For purposes of this study, Fangraphs WAR (including batting) is considered to be an absolute and perfect statement of a pitcher’s true value. All other stats are measured by the consistency with which they agree with THIS stat, Fangraphs Wins Above Replacement adjusted to include batting.
The Pitching WAR given by Fangraphs does not include batting. A pitcher’s batting contribution is given on another page, and is stated relative to average, rather than relative to replacement. A pitcher in a DH league is thus at zero—no at bats. 65% of pitchers have batting contributions no larger than positive or negative 0.2.
I studied only pitchers pitching 199.0 to 201.0 innings so that innings pitched would be (essentially) a constant in the comparisons, and we would not be fighting workload differences to make accurate comparisons. Since different sources have different data, I learned in the course of the study that Fangraphs has innings pitched totals, for a handful of pitchers, which would put them outside the parameters of the study. But for purposes of illustration. . .Pedro Martinez, 2002, had obviously the best season ever by a pitcher who finished the season in the range of 199.0 to 201.0 innings. Martinez was 20-4 with a 2.26 ERA, striking out 239 batters and walking 40. Fangraphs credits Martinez with 7.8 WAR, 20% more than any other pitcher in the study. Another Red Sock, Josh Beckett in 2007, is second on the list at 6.5, which is total coincidence; I had no idea, when I designed the study, that two Red Sox of my era would be 1-2 on the list.
Anyhoo, Pedro is first not only in overall WAR, but in many or most of the individual categories. He has the highest strikeout total in the group, the best won-lost record, and leads in several other individual categories. When we compare Pedro’s strikeout rate to that of any other pitcher, then, this makes a yes/no contribution for Pedro of 238-0. Comparing Pedro to each of the other 239 pitchers, Pedro had a higher strikeout rate than each of the other 238 pitchers, and was also a better pitcher in fact than each of the other pitchers. The two measurements always agree. 238 and 0.
This makes Martinez, in a sense, almost irrelevant to the study, since his contribution to the totals is very close to 238 and 0 in almost every category. The people who are most relevant to the study, most instructive in the results, are the pitchers who rank high in one area but low in another. If a pitcher is 10-17 but has a good ERA, or if has a very good WHIP but a very poor strikeout to walk ratio, the study gets more information as to which category is more closely connected to true value.
I checked the "value matches" for twelve categories of performance. Almost alphabetically, those were:
Earned Run Average
Home Runs Allowed Per 9 Innings
Relative ERA (ERA compared to the league ERA)
Runs Allowed Per 9 Innings (Including un-earned runs)
Season Score (My own summary stat, combining a mix of pitching measurements into one number)
Strikeouts Per 9 Innings
Strikeout to Walk Ratio
Strikeout to Walk Margin
Walks Per 9 inning
WHIP (Walks and Hits Per Inning)
Winning Percentage
Won-Lost Record
The won-lost records are scaled in this way: Two wins minus losses, plus winning percentage. . ..so that 15-7 is considered better (23.682) than 14-6 (22.786) but worse than 14-5 (23.737). Strikeout to walk margin is strikeouts minus walks, per 27 outs recorded.
In the initial part of my research, I systematically compared each pitcher to each other pitcher in Value and in the other category. I did this with a spreadsheet that sped up the work, of course, but it was still a tedious process, but yielded a precise result given its assumptions. One category, for example, had a yes/no score of 16,781-10,945, a percentage of .605244. When you make all possible comparisons within a group of 239 players you have a potential total of 28,441 matches, but some of those are always ties, so you never quite get to 28,441.
After doing a few categories, I realized that I could get the exact same results much more quickly by doing a large number of random pitcher-to-pitcher comparisons, so in the middle of the study I switched to a process of doing random matches, rather than doing every possible comparison one time, but doing 100,000+ random matches, so that the measurement was accurate. It was at least five times faster for me to do that; let’s not get into why that was.
Anyway, my purpose today is just to ask you which categories you think will be most important, and which you think will turn out to be least important? Comments invited.