What W-L Record does the pitcher deserve?

By Bill James

August 8, 2017

2017-35

Explaining the System

At some point I have to pause and explain how I estimate the won-lost record that the pitcher deserved. The way I would have done this in the 1980s would have been to take the league average of runs scored per game, and count that as the runs scored for the pitcher, and take his runs allowed, count that as the runs scored against the pitcher, apply the Pythagorean method to get a winning percentage, and then apply that winning percentage to the pitcher’s number of decisions. That is still essentially what I have done; that’s the broad outlines of the method. I’ll refer to that, in the description below, as the crude method. But there are several little wrinkles that have been added to the method, so I need to explain those today. Let’s see. . .who should I use as the test cases to illustrate the process? How about the 1985 Cy Young Award winners, Doc Gooden and Bret Saberhagen?

The American League in 1985 scored 4.56 runs per game, and Saberhagen allowed 3.02 runs per nine innings. Applying the Pythagorean Method, that creates an expected winning percentage of .695. Multiplying Saberhagen’s decisions—26—by a .695 expected winning percentage, we get an expected won-lost record, for Saberhagen, of 18-8. This would be the crude estimate of his deserved won-lost record.

The National League in 1985 scored 4.07 runs per game, and Dwight Gooden allowed 1.66 runs per nine innings. Applying the Pythagorean Method, that creates an expected winning percentage of .857, which, remarkably enough, was Gooden’s actual 1985 winning percentage, .857. So, re-applying this to his 28 decisions, Gooden’s deserved won-lost record, in the crude approach, is his actual won-lost record, 24 and 4.

We will advance this method in six ways:

1) We will use runs scored by the league per nine innings, rather than runs scored per game,

2) We will park-adjust the runs scored by the league for the park in which the pitcher pitched,

3) We will make a "second estimate" of the pitcher’s effectiveness, the first estimate being his runs allowed,

4) We will combine the two estimates into one,

5) In assigning a number of decisions to the pitcher, we will assign those based on the norms of the time, rather than simply accepting the number of decisions that the pitcher happened to have, and

6) We need to adjust the expected wins by the starting pitcher era by era for small discrepancies between expected and actual wins by starting pitchers; I’ll explain later.

1) The American League in 1985 scored 4.56 runs per game, but 4.60 runs per nine innings. The National League scored 4.07 runs per game, and also 4.07 per game. . . well, the number actually changes from 4.067 to 4.068, but it’s not much of a change.

2) The Park Factor for Kansas City in 1985 was 101, which actually means 1.01, which means that the Royals scored and allowed 1% more runs per inning in their home park than they did on the road.

There are two numbers here, the Park Factor and the Park Adjustment. You have to adjust the league run context for the effects of the park, but remembering (a) that only one-half of the team’s games are played in that park, and (b) that the Royals didn’t have Royals Stadium (now Kaufman Stadium) among their "road" parks, while every other team in the league did.

The formula by which we derive the Park Adjustment from the Park Factor is

(N-1) * PF + (N – PF)

---------------------------

2 * (N-1)

Where N is the number of teams in the league, and PF is the Park Factor. For the Royals

in 1985, this would be

13 * 1.01 + (14 – 1.01)

-------------------------------

Which is 13.13 + 12.99 = 26.12, divided by 26, which is 1.0046. In essence, we are saying that there are two sets of games here, the home games and the road games, which are equal in number. The "home" games are represented by 13 * 1.01, or 13.13; the road games are represented by 14 – 1.01, or 12.99. A park which increases offense by 1%, in a 14-team league, increases the offensive context for that team by .0046. (I think that Pete Palmer actually invented that formula, but I always think of it as my own because I never remember the formula; I just think about how this should be done. When you think it through you wind up with that formula, the one that Pete invented.)

Shea Stadium in 1985 had a Park Factor of 90, which, in a 12-team league, creates a Park Adjustment of .954545. I hope that makes sense.

It is fair to wonder whether a three-digit or four-digit Park Factor would be more accurate. In truth, though, it makes so little difference that it is really a waste of time to use a three-digit Park Factor.

So the American League in 1985 scored/allowed 4.60 runs per nine innings, but adjusted for the Park, the Royals’ number is 4.62. The National League scored/allowed 4.08 runs per nine innings, but the Mets’ park-adjusted number is 3.88.

For the sake of clarity, I could have and would have made this park adjustment in 1985, had I been writing this study then. What I did not have in 1985 was Park Factors for every team in baseball history, or an organized data base by which to apply them to every pitcher. I am able to do that now; I wouldn’t have been able to do that fifteen or twenty years ago.

3) The second estimate of the pitcher’s effectiveness, the first estimate being his runs allowed, is based on his three true outcomes. You may remember than in yesterday’s article I observed that, while Bob Gibson in 1968 obviously was a spectacularly effective pitcher, he was not SO effective that he could be expected to post a 1.12 ERA, given his other stats—his strikeouts, walks, hit batsmen and home runs allowed.

How do we do this?

Runs Created Formula. The Runs Created Formula is:

(Hits + Walks + Hit Batsmen) * Total Bases

-------------------------------------------------------

Plate Appearances

We know what the pitcher’s Walks, Hit Batsmen and Plate Appearances are, from his official pitching line. That leaves Hits and Total Bases.

We don’t want to use the pitcher’s actual Hits Allowed to represent "Hits" in this formula, because, as Voros McCracken demonstrated, there is a lot of luck in how many of the Balls In Play against a pitcher become Hits. We’re trying to take the luck OUT of the record, as much as we can. What we use instead is his Balls In Play, multiplied by the major league percentage of Balls in Play for the season which became hits.

The percentage of Balls in Play which became hits in 1985 was .27709 (.27713 if figured from the batter’s stats. . .not sure what the difference is.) We multiply Saberhagen and Gooden’s Balls in Play by that number. For his hits allowed, we add his home runs allowed to this number. (The ball in play average was .279 in the AL, .275 in the NL. You can figure it by league if you want to. But (a) the difference between the leagues is usually tiny, and (b) it isn’t clear that it’s a "real" difference, as opposed to a random difference. So I use the major league number, which does vary some over time, although not a lot.)

For total bases allowed, we figure the major league average of bases per hit, excluding home runs. For 1985, this is 1.262. We then multiply his non-home run hits by this number, and add 4 times the home runs. This is his expected total bases allowed.

These numbers we run through the Runs Created formula, to figure how many runs created the pitcher has allowed. At this point, though, we have not adjusted for his Wild Pitches and Balks. Let’s assume that each Wild Pitch that he throws and each Balk he commits has a value of 0.25 runs. We add these to his runs created.

Now we have an "expected runs allowed" number for the pitcher which is, on average, the same as his runs allowed. If you take a thousand pitchers and figure their expected runs allowed and their actual runs allowed, they should be very nearly the same.

In an individual case, however, they may be somewhat different. For Saberhagen and Gooden, as for most Cy Young Award winners, the number of runs that the pitcher could have been expected to allow is somewhat higher than the number that he actually did allow. Saberhagen allowed 79 runs, but could have been expected to allow 89. Gooden allowed 51 runs, but could have been expected to allow 81.

4) We combine these two figures into one, weighting the expected runs allowed at one third, and the actual runs allowed at two-thirds. Doing this, Gooden’s runs allowed are adjusted from 51 to 61, and Saberhagen’s from 79 to 82.

Let’s pause for a moment to explain or debate this adjustment. This deviation between expected runs allowed based on the three true outcomes and actual runs allowed could be either luck, or it could be attributable to performance variables that we have not measured. There are at least four traits of a pitcher which are not measured in the three true outcomes:

a) The pitcher’s fielding, including his ability to prevent stolen bases,

b) The pitcher may have pitched well at crucial moments, leading to a low batting average with runners in scoring position or a low on base percentage leading off innings,

c) The pitcher may get more ground balls (or fewer) than average, resulting in an unusual relationship of runs allowed to strikeouts, walks and home runs allowed,

d) A pitcher could have an ability to induce weak contact.

If you evaluate the pitcher entirely on the three true outcomes—which I believe that one of the WAR systems does—that assumes that all discrepancies between expected and actual runs allowed are just luck. On the other hand, if you ignore the three true outcomes, and base your evaluation entirely on the pitcher’s runs allowed, that assumes that there has been NO luck involved in how many runs the pitcher allowed.

Well, but we know that there is SOME luck involved here. If these discrepancies occurred entirely because of the pitcher’s skill, they should have a strong consistency from year to year. In fact, there is very little consistency in those discrepancies from year to year. A pitcher who allows 20 hits fewer than expected one year will allow 20 hits more than expected the next year. A pitcher who allows a .180 batting average with runners in scoring position one year will allow a .330 batting average with runners in scoring position the next year, with strikeouts and walks which are about the same. Bob Gibson allowed a .141 batting average with runners in scoring position in 1968, but a .239 batting average in those situations in 1970, when he again won the National League Cy Young Award. It is not reasonable to believe that these things happen because of year to year variations in the pitcher’s skill.

A pitcher’s ability to get ground balls is a real thing, but it is a real thing which has been tremendously exaggerated, both by announcers and by sabermetric analysts. It’s actually not a very large advantage. With the exception of a handful of pitchers in each generation, it’s basically nothing. There is no evidence yet that any pitcher has an ability to induce weak contact, as opposed to that being simply something that happens in a good year. The question is, though: in combining the runs allowed by the pitcher with the expected runs allowed, what weight should be given to each one?

It is my view that in combining a real-world outcome with a theoretical measurement, most of the weight must of course be given to the real-world outcome, even though the theoretical measurement would do a better job of predicting future outcomes. I have elected to weight the real outcome at two-thirds, and the theoretical expectation at one-third. If you do your own study, you can do it as you see fit.

5) In cruder versions of this approach, I would either (a) base the pitcher’s expected won-lost record on his actual number of decisions, or (b) assign him one expected decision for each nine innings pitched. In this iteration of the project, I assigned each pitcher a number of decisions based on the normal ratio of innings pitched to decisions by a starting pitcher in his era.

In the 1980s/1990s, starting pitchers typically were assigned one decision for each 26.23 outs recorded. (In the 1960s/1970s, this number was 26.32; in the 21^st century it is 25.54. For some reason, starting pitchers now get slightly more decisions per inning pitched. Comparing a pitcher pitching 200 innings in the 1980s and a pitcher pitching 200 innings today, there is an expected increase of about 0.6 decisions.) So anyway, we assign Doc Gooden in 1985 responsibility for 31.6 decisions, and Saberhagen responsibility for 26.9.

OK, now we figure the Pythagorean winning percentage, as we did before, only we have park-adjusted the "runs for" and adjusted the "runs against" to accommodate discrepancies between actual runs allowed and the pitcher’s other statistics. For Gooden, we now have 3.88 runs per game for him, and 1.99 runs per game against him. It makes a .792 deserved winning percentage. For Saberhagen, we now have 4.62 runs per game for him (his context run level), and 3.15 per game against him, a .683 percentage.

Multiplying the .792 winning percentage for Gooden by the 31.6 expected decisions, we come up with 25.067 expected wins for Gooden, and 18.377 for Saberhagen. But we are not quite at the end of the road here.

6) We need to adjust the expected wins by the starting pitcher for small discrepancies between expected and actual wins by starting pitchers. The norms in the game change a little bit all the time. I figured the actual and "deserved" wins for all of the pitchers in each era of baseball history. In the 1980 to 1999 period, starting pitchers had more wins than expected, 4/10^th of one percent more. This discrepancy could have something to do with the use of the bullpen in that era, or it could be random, or it could be due to a flaw in the process somewhere, but in any case I don’t want it there, so I adjusted it out of existence by multiplying pitchers wins as figured before by 1.004. This increases Gooden’s "Deserved Wins" from 25.07 to 25.17, and increases Saberhagen’s from 18.38 to 18.45.

Subtracting the Deserved Wins from the expected decisions, then, Gooden has 6.48 Deserved Losses, and Saberhagen has 8.47. Gooden has a deserved won-lost record of 25-6, and Saberhagen, of 18-8. Which was the same as his expected won-lost record in the crude analysis.

COMMENTS (12 Comments, most recent shown first)

tangotiger
By the way, you may notice that Bill is constantly creating a new pitching metric from scratch. It's basically his way to always start with a clean slate, to free himself of whatever may have lead him to wherever he happened to be. Which makes it interesting that he's always coming up with something new, albeit centered around those three pillars.

What we're all doing is creating ESTIMATES of a pitcher's performance, and there's a dozen ways to get there. While they coalesce for most pitchers most of the times, sometimes they don't. Ricky Nolasco is the typical example of where you get diverging ESTIMATES depending on your starting assumptions.

All to say: this is tough.

8:53 AM Aug 9th

CharlesSaeger
Also, how much different would the estimates be if you used the Pythagorean exponent of R/G^0.285 instead of a straight ^2?
9:10 PM Aug 8th

tangotiger
Just a side note:

Fangraphs uses FIP as its core for WAR. Baseball Reference uses RA/9 as its core for WAR.

We know both are wrong, but it's useful that both have taken the two extreme positions. Since the true answer is somewhere in the middle, I've basically championed the idea of doing a 50/50 and moving on.

Bill has taken a similar position, except it's a 33/67 weighting respectively.

Additional sidenote: Baseball Prospectus at its core uses RC for their WAR. Which is really the third choice available.

Final sidenote: Bill's Season Score and my version of Game Score is basically an amalgamation of these three positions, in some weighting or other. We've basically agreed on what everything should look like, and now it's just a matter of figuring out how much each of the three pillars should get weighted.
9:02 PM Aug 8th

CharlesSaeger
Would the reason for the higher decisions per out be that you can lose the game with any number of outs, but need at least 15 to get the win, and with teams using more relievers, a starting pitcher gets fewer outs after the minimum number he needs to get the decision in question?
9:00 PM Aug 8th

MarisFan61
Oh i see -- you meant to use TWO digits rather than 3 or 4.
7:04 PM Aug 8th

MarisFan61
Question about possible 'slip of the pen': Didn't you mean waste of time to use 4 digit park factor, not 3?
6:43 PM Aug 8th

Riceman1974
Would it not be better to estimate decisions based on outs that specific year as opposed to era. The 1980s and 1990s were so vastly different, just like any 20 year period in human history, in any context. By assigning decisions on a year by year basis, you are comparing apples to apples, as opposed to apples to apple pie.
5:10 PM Aug 8th

Guy123
Ah, now I see. In previous versions you used the pitcher's actual number of decisions, but now you are making an adjustment. Good change.
1:44 PM Aug 8th

Guy123
This is a fun and interesting topic. I wonder though what is the theory behind assuming a starter will have the same number of decisions in the "Deserved" scenario as he had in the actual season? The number of total decisions a pitcher had was itself a function of the very context and luck that you are seeking to remove, e.g. the number and timing of runs scored by his teammates (as well as how well the pitcher pitched). It would seem to be a relatively easy calculation to estimate how many decisions a pitcher "deserved," given the number of games he started and his pitching performance. If one is going to go to all the trouble to create counterfactual seasons for these pitchers, why pretend that they will always have the same total number of decisions?
1:22 PM Aug 8th

sroney
I suppose it is some sort of OCD that makes me want to figure out the difference in batting average on balls in play when calculated from the pitcher or the hitter.

If the pitcher calculation uses batters faced - (walks + HBP + strikeouts + homers) to get the balls in play, that would end up counting catcher's interference as a batted ball, which would not show up as an at bat in the batter's stats, and that difference might be about the right size for the discrepancy shown.
12:17 PM Aug 8th

OldBackstop
That all makes good sense.

Bill, I asked this in the comments on the first article, but how are you handling pitchers who had a significant amount of relief innings and decisions in their career totals?
11:21 AM Aug 8th

CharlesSaeger
More or less what one should have expected, judging from the other articles.

Do you have a data file for all this?
10:12 AM Aug 8th

What W-L Record does the pitcher deserve?

COMMENTS (12 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: