May 1, 2007
I. Set Up

A couple of years ago I did a television show which consisted on the air of being yelled at by Alan Dershowitz, which wasn’t fun, but off the air of sitting in a room for eight hours with Steve Garvey and Dave Parker and Bill Lee and a couple of other guys, which was a lot of fun. Lee and Parker and Garvey spent an hour or more telling stories about the Nasty Dutchman, many of which I can’t repeat. One of them remembered a time when Blyleven bribed a driver to drive the bus several miles out of the way so that he could moon his ex-wife’s house from the team bus. Dave Parker told a story about Blyleven, on national television, ostentatiously picking his nose, attaching the booger to the baseball and striking out Parker with a booger ball.

Blyleven probably doesn’t want me telling these stories, and who knows whether they are true or not? For many years I have been fascinated by the Bert Blyleven problem, and I have long wanted to do a definitive study of the issue. The Bert Blyleven problem, simply stated, is that Blyleven’s won-lost record does not jibe with his innings pitched and ERA. Blyleven pitched just short of 5,000 innings in his career, with a 3.31 ERA. Other pitchers with comparable combinations won 300 games, 310, 320. Blyleven didn’t.

This shortfall is well known, and Bert Blyleven has been left out of the Hall of Fame because of it. Life isn’t always fair. Other pitchers with 280 wins, and some with 220 wins, and some with less, have made the Hall of Fame—and with the same ERA. Alright; life’s a bitch.

Bert Blyleven is an intriguing figure because he is the most conspicuous victim of what most of us regard as a malicious fiction. You ask any baseball writer from Blyleven’s era why Bert hasn’t been bronzed, and the guy will tell you "He wasn’t a winner. His team scored 3 runs, he gave up 4. They scored 1, he gave up 2. He had good numbers, but he didn’t win the tough games."

Most of us don’t believe that this ability to win the close games really exists, and many of us kind of resent Blyleven being discriminated against because he fails a bullshit test. Still, in theory, Bert’s detractors could be on to something. Suppose that you have two pitchers. One, whom we will call Ferguson Winner, loses a game 6-0, but wins six others 1-0, 2-1, 3-2, 4-3, 5-4 and 6-5. The other, whom we will call Bert Loser, wins a game 6-0, but loses six others by the same scores (1-0, 2-1, etc.)

The run support for both pitchers is exactly the same: 21 runs in 7 games. Their runs allowed are exactly the same: 21 runs in 7 games. But Ferguson has gone 6-1 and Bert has gone 1-6, because Ferguson has matched his effort to the runs he has to work with. My point is, there could be something there that isn’t measured by run support and isn’t measured by runs allowed. We’ll call it the ability to match.

If Blyleven in fact had an inability to match the effort needed, it must be possible to document this by examining his career log game-by-game alongside that of comparable pitchers. Beyond that, it must possible to place a value on it, or to measure the cost of it. The Sabermetric Encyclopedia estimates that Blyleven was 344 runs better than an average pitcher over the course of his career. How many of those 344 runs should be offset because of this inability to match the effort needed?

As a rule of thumb, a pitcher can be expected to be about one game over .500 for each three to five runs that he saves. Luis Tiant was 57 games over .500 (229-172) and saved 172 runs. Carl Hubbell saved 355 runs—about the same number as Blyleven—and was 99 games over .500 (253-154). Roger Clemens through 2006 has saved 727 runs and is 170 games over .500. Pedro Martinez through 2004 has saved 506 runs and is 114 games over .500. Bret Saberhagen saved 241 runs and was 50 games over .500. These are normal ratios.

Blyleven, however, saved 344 runs but was only 37 games over .500. Among the 25 other pitchers in major league history who were at least 300 runs better than league in ERA, most (15 of the 25) were more than 100 games over .500, and all were at least 44 games over .500. Phil Niekro was only 44 games over, but the reasons for that are fairly obvious; Niekro pitched most of his career for bad teams, and also allowed more than 300 un-earned runs, many of them because of Passed Knuckleballs. Blyleven did neither of those things. He pitched for as many good teams as bad teams, and he allowed less than 200 un-earned runs.

If Blyleven were one game over .500 for each four runs saved, he would have finished with about 311 career wins (311-226). The Sabermetric Encyclopedia credits Blyleven with 313 "Neutral Wins" (313-224). He’s actually 287-250.

Blyleven’s won-lost record is about 196 runs worse than his ERA. He is 37 games over .500; that’s equivalent to 148 runs. He should be 344 runs better than average. That’s a 196-run discrepancy. These odd ratios create a prima facie case that Blyleven was guilty of the failure to match the effort needed. I’m skeptical, but. . .we’ll never know unless we how to look.

II. The Method

Suppose that we form a group of pitchers who are similar to Blyleven in terms of games started, innings pitched and ERA, but different from Blyleven in terms of wins and losses.

The group that I formed has seven pitchers—Steve Carlton, Tommy John, Jim Kaat, Ferguson Jenkins, Don Sutton and Phil Niekro, plus Blyleven. Carlton, Jenkins, Sutton and Niekro are in the Hall of Fame; Kaat and John aren’t, but then their other stats aren’t quite as good as Blyleven’s.

Anyway, these seven pitchers made an average of 684 starts in their careers. Blyleven made 685. They pitched an average of 4,945 innings; Blyleven pitched 4,970. They allowed an average of 2,073 runs, 1,835 of them earned; Bert allowed 2,029 runs, 1,840 of them earned. They had an average ERA of 3.32; Blyleven’s was 3.31.

These pitchers, then, are on average very, very similar to Bert Blyleven. But whereas Blyleven won 287 games and lost 250, the seven pitchers won an average of 302 and lost 245, 57 games over .500. Take out Blyleven, and the other six won an average of 304. Among the seven pitchers, Blyleven is first in runs saved against average (because he played in a slightly higher run context than the others), but last in winning percentage.

These pitchers are very similar to Blyleven in the aggregate, and they are also generally similar to him individually, particularly with regard to ERA. These are the career ERAs of the seven pitchers:
Pitcher ERA
Blyleven 3.31
Sutton 3.26
Jenkins 3.34
Kaat 3.45
John 3.34
Carlton 3.22
Niekro 3.35

All of them have career ERAs within a few points of Blyleven’s. The key question is, why did Blyleven not win the games he should have won? There are two possible explanations:
  1. Blyleven’s run support was weaker than the other pitchers, and
  2. Blyleven did a poorer job than the other pitchers of matching the effort.
What we mean, specifically, by failing to match the effort is that, given two runs to work with, he was less likely to deliver a victory than were the other pitchers in the group. Given three runs to work with, he was less likely to deliver a victory than were the other pitchers in the group. Given six runs to work with, he was more likely than the others to blow up and let the other team win anyway.

I compiled game-by-game pitching logs for the careers of each of these seven pitchers. Obviously, I used Retrosheet data to do this, and I am very grateful to Retrosheet for making this available, and also to several individuals (Dave Smith, Clem Comley and others) who responded to my request for help when I was missing a few appearances for Jim Kaat. Also, while I am on the subject, my son Isaac helped me with this project.

Anyway, my purpose in doing this was to be able to look at all of the games started by Bert Blyleven and the other pitchers, and ask two questions:
  1. When supported by X runs, how often did Blyleven deliver a win?
  2. How does this compare to the other pitchers in the group?
My intention, by doing this, was to take a range of issues off the table, to focus directly on the ability to match the effort. Ferguson Jenkins was a better hitter than Blyleven, thus had more runs to work with? Doesn’t matter. How he got the runs is not relevant to what we are talking about right now. Since these other pitchers allowed runs at the same rate that Blyleven did, then, given a certain number of runs to work with, he should have the same ability to win as they do—regardless of where the runs come from.

Don Sutton pitched in much lower run contexts than Blyleven? It doesn’t matter. Their career ERAs are essentially the same. If two pitchers have the same ERA, one would expect them to have the same ability to win, given a certain level of run support. The fact that one of them works in a hitter’s park and one in a pitcher’s park doesn’t change that. It changes their expected run support, but not their expected won-lost record with a fixed level of run support. Nothing should change the pitcher’s ability to win with a given level of run support, except, to a minor extent, un-earned runs. And we’re not worrying about the un-earned runs because Blyleven didn’t allow many un-earned runs, which therefore can’t be the reason he didn’t win more.

III. Flotsam

Before giving you the straight results, let me first report on one rather stunning data point from the study.

When given three runs to work with—not three or more but three exactly—Don Sutton had a career won-lost record of 52-33, and his teams had a won-lost record of 65-51.

When given three runs to work with, Bert Blyleven had a career won-lost record of 29-48, and his teams had a career record of 35-62. Sutton’s winning percentage, working with three runs, was .612. Blyleven’s was .377.

In other words, just in this one data point—three runs to work with—Don Sutton was a whopping 19 games better than Blyleven, and his teams were 20 ½ games better. If that data were representative of the entire study to even a tiny extent, we would have to conclude that Sutton was massively better than Blyleven at matching the effort needed to win the game.

Another note: On September 17, 1978, Phil Niekro had 19 wins with three starts remaining. He pitched well all three times and completed all three games in an effort to win his 20th game—but his team was shut out all three times. He lost 2-0, 2-0 and 4-0.

Another note: the best game of Jim Kaat’s career was on September 18, 1967 at Kansas City—a ten-inning shutout in which Kaat struck out 12 batters and walked no one, winning the game 2-0, Game Score of 92.

The worst game of Kaat’s career was ten years later to the day—September 18, 1977—and also in Missouri. At St. Louis on that date, Kaat pitched one and one-third innings, giving up 9 hits and 7 runs, and losing the game 5-12, Game Score of 8.

IV. Actual Results of the Study

Blyleven’s relatively poor won-lost records are primarily a result of poor offensive support. He was below the group average in terms of his ability to match the effort of the opposing pitcher, and this did cost him a few games over the course of his career—somewhere between 7 and 11 games. But most of the discrepancy is caused by sub-standard offensive support.

In Blyleven’s 685 major league starts, his team scored 2,869 runs, or 4.19 runs per start. This is the data for the seven pitchers in the study:
Pitcher GS Tm Runs Average
Jenkins 594 2605 4.39
Kaat 625 2737 4.38
Carlton 709 3096 4.37
John 700 2969 4.24
Niekro 716 3022 4.22
Blyleven 685 2869 4.19
Sutton 756 3130 4.14

Blyleven’s offensive support was the second-poorest among the group of seven pitchers—and it is actually worse than the average reveals (although not worse than Sutton’s). This chart gives the winning percentage of their teams with one of these pitchers on the mound and a given number of runs to work with:
Offensive Sup   Wpct
More than 12 runs 1.000
12 runs .959
11 runs .964
10 runs .963
9 runs .916
8 runs .857
7 runs .882
6 runs .805
5 runs .734
4 runs .579
3 runs .474
2 runs .347
1 runs .151
0 runs .000

The average number of runs per start can be misleading in this way: that 10 runs in one game and a shutout in the next are not the same as 5 runs in each start. If you give one of these pitchers 5 runs in each of two starts, that creates an expectation of 1.468 wins. If you give him 10 runs in one start but none in the next, that creates an expectation of 0.963 wins—a large difference. Blyleven’s offensive support is actually worse than it looks, because he had a disproportionate number of games when he was denied those critical first few runs which have an exaggerated impact on the win total.

This chart gives the number of times in his career that Blyleven was supported by each number of runs:
Support   G
More than 12 runs 11
12 runs 8
11 runs 6
10 runs 16
9 runs 19
8 runs 36
7 runs 43
6 runs 46
5 runs 90
4 runs 82
3 runs 97
2 runs 104
1 Runs 85
0 Runs 42

Blyleven was shut out 42 times in his career, which is a little below the group average. But he was forced to work with one or two runs 189 times in his career, or 28% of his career starts. By contrast, Ferguson Jenkins was limited to 1 or 2 runs in only 22% of his career starts, Phil Niekro 23%, and no other pitcher in the group more than 26%.

We can figure, for each pitcher, his "expected team winning percentage", based on one of the charts above. If the pitcher’s team scores 7 runs, this creates an expectation of .882 wins, since the winning percentage of their teams with one of these pitchers on the mound and 7 runs on the scoreboard was .882. But if the team scores only one run, that creates an expectation of only .151 wins.

Blyleven’s teams, by the number of runs they scored, had an expectation of 371 wins, 313 losses and one tie in 685 starts. Their actual record was 364-321. Blyleven fell short by seven and a half games, and it may be appropriate to dock him somewhere between 60 and 85 runs for that shortfall.

But of the 196-run discrepancy in Blyleven’s career (the discrepancy between his runs saved and his won-lost record), about two-thirds is explained by poor offensive support. Only about one-third is attributable to his failure to match the effort needed.

V. Data Overkill

The charts following give the number of times that each pitcher in the study was given each number of runs to work with, the pitcher’s individual won-lost record and ERA in those games, and the pitcher’s team’s won-lost record in those games.
Phil Niekro   GS W L ERA Team Wins Losses
More than 12 runs 10 8 0 2.42 10 0
12 runs 5 3 0 5.35 5 0
11 runs 6 5 0 3.77 5 1
10 runs 15 14 0 2.46 15 0
9 runs 24 19 1 3.93 20 4
8 runs 35 19 3 4.38 31 4
7 runs 46 30 3 3.38 39 7
6 runs 67 40 4 3.19 53 14
5 runs 86 51 8 3.34 65 21
4 runs 92 41 35 3.71 48 44
3 runs 107 34 43 3.24 45 62
2 runs 104 32 58 2.89 36 67
1 runs 59 6 49 3.52 7 51
0 runs 60 0 58 3.45 0 60
    716 302 262 3.37 379 335
Steve Carlton   GS W L ERA Team Wins Losses
More than 12 runs 15 13 0 3.64 15 0
12 runs 6 5 0 1.20 6 0
11 runs 11 9 1 3.20 10 1
10 runs 15 10 0 3.85 15 0
9 runs 26 18 0 4.44 26 0
8 runs 42 24 3 4.19 35 7
7 runs 40 29 3 3.32 32 8
6 runs 65 43 4 3.39 55 10
5 runs 77 48 13 2.96 58 19
4 runs 89 47 17 2.99 57 31
3 runs 94 35 39 3.09 44 50
2 runs 102 34 55 2.80 37 65
1 runs 71 12 52 2.97 13 58
0 runs 56 0 53 3.37 0 56
    709 327 240 3.20 403 305

Don Sutton   GS W L ERA Team Wins Losses
More than 12 runs 9 8 0 4.39 9 0
12 runs 9 6 0 7.88 8 1
11 runs 7 6 0 3.94 7 0
10 runs 14 12 0 1.77 13 1
9 runs 19 15 1 2.81 17 2
8 runs 29 17 2 4.39 25 4
7 runs 59 38 4 3.39 52 7
6 runs 73 39 6 3.49 59 14
5 runs 77 48 9 3.10 59 18
4 runs 100 46 26 3.24 60 40
3 runs 116 52 33 2.91 65 51
2 runs 94 24 49 3.01 33 61
1 runs 101 10 80 3.40 12 89
0 runs 49 0 43 3.23 0 49
    756 321 253 3.27 419 337
Jim Kaat  


GS W L ERA Team Wins Losses
More than 12 runs 5 4 0 4.24 5 0
12 runs 5 4 0 3.53 5 0
11 runs 9 8 0 2.39 9 0
10 runs 19 16 0 4.04 18 1
9 runs 24 17 0 3.77 22 2
8 runs 34 19 2 3.98 30 4
7 runs 46 24 2 3.69 44 2
6 runs 58 33 5 3.40 44 14
5 runs 73 41 11 3.86 53 20
4 runs 73 28 19 3.94 38 33
3 runs 88 32 35 3.12 39 49
2 runs 85 26 52 3.37 29 56
1 runs 69 8 55 2.88 9 59
0 runs 37 0 36 2.53 0 37
    625 260 217 3.43 345 277

Ferguson Jenkins GS W L ERA Team Wins Losses
More than 12 runs 7 6 0 5.53 7 0
12 runs 11 10 0 3.50 11 0
11 runs 6 6 0 3.35 6 0
10 runs 15 11 0 3.86 15 0
9 runs 23 17 0 3.53 23 0
8 runs 29 24 0 3.12 29 0
7 runs 50 32 2 3.95 43 7
6 runs 41 27 4 3.13 36 5
5 runs 66 35 15 4.08 43 22
4 runs 90 43 34 3.32 48 42
3 runs 75 32 31 2.76 38 37
2 runs 74 23 41 3.00 26 48
1 runs 57 12 42 2.70 12 45
0 runs 50 0 49 3.50 0 50
    594 278 218 3.32 337 256
Tommy John GS W L ERA Team Wins Losses
More than 12 runs 8 4 0 4.10 8 0
12 runs 5 2 0 5.92 5 0
11 runs 10 7 0 3.29 10 0
10 runs 13 9 1 4.45 12 1
9 runs 19 9 1 5.34 18 1
8 runs 32 19 2 4.67 23 9
7 runs 49 28 0 3.34 43 5
6 runs 71 38 9 3.88 53 18
5 runs 78 44 10 3.08 56 22
4 runs 95 45 27 3.43 58 37
3 runs 103 41 32 2.74 56 47
2 runs 97 28 51 3.09 33 64
1 runs 84 10 62 2.99 11 73
0 runs 36 0 33 3.09 0 36
    700 284 228 3.35 386 313

Bert Blyleven GS W L ERA Team Wins Losses
More than 12 runs 11 10 0 2.29 11 0
12 runs 8 6 0 4.18 7 1
11 runs 6 4 0 3.40 6 0
10 runs 16 12 0 3.26 15 1
9 runs 19 13 1 3.80 16 3
8 runs 36 25 2 3.85 30 6
7 runs 43 30 1 2.95 39 4
6 runs 46 28 2 3.21 39 7
5 runs 90 52 13 3.11 67 23
4 runs 82 38 19 3.42 49 33
3 runs 97 29 48 3.32 35 62
2 runs 104 24 57 3.17 35 69
1 runs 85 15 65 3.45 15 70
0 runs 42 0 40 3.57 0 42
    685 286 248 3.31 364 321

I highlighted before the remarkable difference between Sutton and Blyleven in working with three runs. But only Sutton was that effective with three runs. Niekro, Kaat and Steve Carlton, given three runs to work with, had losing records.

Working with just one run, Blyleven ties Carlton for the best winning percentage in the group (.188). Blyleven’s 15 career 1-0 victories is one of the highest totals of all time; this is well known. Working with four runs, six runs, seven runs or eight runs, Blyleven was better than Sutton. This narrows the gap between them.

Blyleven was stuck with just one run to work with 85 times in his career—compared to 57 times for Ferguson Jenkins, 69 for Kaat, 71 for Carlton and 59 for Phil Niekro. Since those games are almost automatic losses, that in itself is a huge difference between Blyleven and his peers. (Only Don Sutton had as many one-run challenges as Blyleven.) But in the "nearly automatic victory" categories of six runs and up, Blyleven had only 185 in his career—as opposed to 208 for Niekro, 220 for Carlton, 219 for Sutton, 200 for Jim Kaat, and 207 for Tommy John. Only Ferguson Jenkins had fewer than Blyleven—and he had far more as a percentage of his starts.. Blyleven had six or more runs to work with in only 27% of his career starts, whereas all of the other pitchers in the study had six or more in at least 29% of their starts.

It may sound like a small thing, but Steve Carlton had 151 wins those games; he was 151-11. Blyleven had a better percentage, but he had only 128 wins (he was 128-6). He’s short by 23 wins, 18 if you adjust for games started. It’s the difference between 288 wins and 300-plus. Thus, that’s what’s keeping him out of the Hall of Fame—a documented shortage of cheap wins.

VI. A Couple of Methods

I developed a couple of methods to analyze this data, one of which I will report on and one of which I will bury. I developed a method to create what I call an "effective runs allowed rate", or "wins based runs allowed rate." The goal of the effective runs allowed rate was to measure the ability of a pitcher to match the effort needed to win—but stated as a runs allowed rate. Suppose you have two games of a pitcher, one with 11 runs of support, one with 2. The pitcher might win 11-4 and 2-1, or he might win 11-1 and lose 2-4. His ERA is the same either way—2.50—but his effectiveness is really very different. How do you measure that difference, stated as a run allowed rate?

Here’s how I did it. If a pitcher’s team won the game, the runs he was charged with allowing were
    One-half the runs scored by his team, rounded down,
    But not to exceed 4 runs in a win.
If his team scored 7 runs and he won, that was entered as 7-3, regardless of whether the actual score was 7-0, 7-2, 7-3 or 7-6. Doesn’t matter; what matters is, he had seven runs to work with, and he won.
    When his team lost, the number of runs he was charged with were
    The runs scored by his team, plus three,
    But always at least 1.5 times the number of runs he had to work with.
If a pitcher had two runs to work with and he lost, the game was entered as 2-5—regardless of whether it was actually 2-3, 2-6, or 2-27. The score doesn’t matter; the win matters.

For a season or for a career, this method creates about the same runs allowed rate for a pitcher as his actual runs allowed rate—unless he has an unusual ability to match the effort needed, or a poor ability to match the effort needed. If he fails to match the effort needed, his effective runs allowed rate will be higher.

Phil Niekro allowed 3.89 runs per nine innings in his career, and also 3.89 runs per nine innings as a starter. His effective runs allowed rate, based on his wins, losses and run support, was 3.94. This chart compares the actual and effective runs allowed rates for these seven pitchers, with the actual ERAs based only on their games as a starting pitcher:
Pitcher Effective Actual
Blyleven 3.89 3.67
Carlton 3.78 3.65
Jenkins 3.77 3.69
Kaat 3.89 3.86
Niekro 3.94 4.02
Sutton 3.75 3.89
T John 3.87 3.59
Average 3.84 3.77

From this we learn three things:
  1. That there is a slight mis-alignment between effective and actual ERAs (.07 runs per game),
  2. That Blyleven and Tommy John were less effective at matching the effort needed than the other five pitchers in this group.
  3. That Don Sutton was more effective at matching the effort needed than any other pitcher in this group.
How many "phantom runs" should Blyleven be charged with, for his inability to win close games?

By this method, 83. Blyleven’s effective runs allowed rate was 0.22 higher than his actual runs allowed rate, of which .07 is attributable to mis-alignment of the method. The remainder is 0.15 runs per nine innings, or one run every 60 innings. For Blyleven’s career as a starting pitcher, that works out to 83 runs.

The other method was a run support-neutral winning percentage, intended to state each pitcher’s ability to win with a fixed number of runs as a winning percentage. This method was much more straightforward and logical than the effective runs allowed method, but unfortunately it doesn’t work very well, and I’m not going to explain it, although I will refer to it in passing some later on. (What I mean by "it doesn’t work for very well" is that it doesn’t predict the pitcher’s actually winning percentage within a reasonable margin. I understand why this is, but I don’t know how to fix it, so I’m just going to skip the whole thing.)

VII. Great Seasons

The best season by any of these pitchers, obviously, was Steve Carlton’s 1972 season. I credit Carlton that season with a run-neutral winning percentage of .846, and with an effective runs allowed rate of 2.80. The run-neutral winning percentage is the best of any of these pitchers in any season; the 2.80 effective runs allowed rate is second, behind Don Sutton’s 2.77 in 1980 (Sutton in 1980 was 13-5 with a league-leading 2.20 ERA.) Also, Jim Kaat was at 2.60 in 1972, but he made only 15 starts due to an injury.

Blyleven’s best "effective runs allowed rate", interestingly enough, was in 1976, when he finished with a 13-16 record but an effective runs allowed rate of 3.33. Blyleven won games that year by scores of 2-1, 1-0, 1-0, 3-0, 1-0, 3-0 and 1-0—seven wins in which his team scored an average of 1.71 runs--but lost games by scores of 2-1, 1-0, 2-1, 3-0 and 1-0—five losses in which he allowed an average of 1.80 runs. A normal number of games with scores that low in a season would five to six; Blyleven had twelve. The only other pitcher in the study to have so many low-scoring games in a season was Carlton in ’72, but Carlton had
    a) five more starts, and
    b) an ERA almost a run lower (1.97 to 2.87).

VIII. Game Scores

On a blog, someone named Chris posted this comment related to the subject:
    His toughest loss was on September 22, 1972 when the eventual champion Athletics beat him 1-0 in the 11th inning. His game score was 82. Twice he got the loss with a game score over 80, seventeen times when it was over 70, 47 times when it was over 60, and 118 times when it was over 50. Though the number of tough losses always is far greater than the number of cheap wins for the six pitches I've checked so far, Blyleven still has a very high number of tough losses.
Defining a "Cheap Win" as any win in which the pitcher had a Game Score under 50 and a "Tough Loss" as any loss in which the pitcher had a Game Score over 50, Blyleven had 26 Cheap Wins and 109 Tough Losses. This chart gives the data for the seven comparable pitchers:

                           GS >50           GS < 50

                     ​      W-L                  W-L            Cheap Wins     Tough Losses

Blyleven           257-109              26-130                 26                     109

John                 245-  67              33-157                 33                        67

Jenkins             250-  97              25-117                 25                        97

Kaat                 223-  70              35-142                 35                        70

Sutton              289-102              29-143                 29                      102

Carlton             311-101              15-133                 15                      101

Niekro             272-102              27-157                 27                      102

Blyleven did have more tough losses than the comparable pitchers. It doesn’t seem to me that this is all that helpful in the discussion. First, there is the problem of strikeouts figuring into Game Scores, which is a pretty minor problem but some people will choose to worry about it. A bigger problem is that it seems to me that if a pitcher did have an "inability to match", this might well result in a larger number of tough losses—not as a result of tough luck, but as a result of his inability to pitch well at the times he most needs to pitch well. The most striking thing on the chart, really, is the very small number of cheap wins by Steve Carlton.

A couple of other thoughts about Game Scores, since we’re hear. I never intended Game Scores as a way of aggregating games, and I think there are some problems with doing that. It was intended as a way of placing the game itself—the individual game—in a kind of global context. Wow; that was impressive? How impressive was it? Or "Wow, that was ugly. How bad was that?"

Of course, when I invented Game Scores 25 years ago, I didn’t have access to complete game logs for large numbers of pitchers. It occurs to me that, had I had this access, I could hardly have done any better on a couple of points. I intended, in creating Game Scores, to create a zero-to-one hundred scale in which the vast majority of starts would fall somewhere in the middle, the good games in the 60s, the poor games in the 30s, the great games at 80 and above, the REALLY great games at 90 to 100.

On that level, it’s hard to imagine how I could have done much better. Look at these pitchers. . ..

Phil Niekro in his career has game scores ranging from 91 to 3, three games in the 90s, two under ten.

Steve Carlton has game scores ranging from 98 to 3, eleven games in the 90s, four under ten.

Don Sutton has game scores ranging from 98 to 6, six games over 90, two under ten.

Jim Kaat has game scores ranging from 92 to 8, two games over 90, one under ten.

Ferguson Jenkins has game scores range from 94 to 6, five games over 90, one under ten.

Blyleven has game scores ranging from 97 to 7, eleven games over 90, three under ten.

Only Tommy John, among this group, has no games over 90; he has game scores ranging from 89 to 4, with three games under ten.

All of the pitchers fit neatly into the zero-to-hundred framework that I had intended, and all of them have these tiny "tails" to their charts that burn out just as the approach the boundaries—just as I had intended. Of course, there have been a few games historically that go over 100 or under zero, but essentially, the games fit very neatly on a zero-to-one-hundred chart.

Second, notice how much it is true that games over 50 are wins, and games under 50 are losses. Again, I’m very pleased with that.

But I missed on one thing. I had wanted a system in which games with Game Scores of 51-55 would mostly be wins, and games with Game Scores of 45-49 would mostly be losses. On that one, I missed. All of these pitchers had hugely successful records when they had Game Scores over 80, and very successful records in the 60s and 70s:
  GS >90     GS 80-89   GS 70-79   GS 60-69
Blyleven 11 - 0   61 - 2   69 - 15   76 - 30
John         35 - 0   76 - 7   74 - 30
Jenkins 4 - 0   65 - 2   81 - 14   69 - 36
Kaat 2 - 0   31 - 1   76 - 8   71 - 27
Sutton 6 - 0   64 - 1   90 - 12   85 - 33
Carlton 10 - 0   54 - 2   115 - 11   91 - 39
Niekro 3 - 0   44 - 0   87 - 11   85 - 47
  36 - 0   354 - 8   594 - 78   551 - 242
  1.000       .978       .884       .695    

None of these pitchers was ever charged with a loss in a game in which he had a Game Score above 90, and they were overwhelmingly successful with Game Scores in the 70s and 80s.

Similarly, Game Scores under 50 had a powerful tendency to end in defeat:
  GS 40-49   GS 30-39   GS 20-29   GS 10-19   GS 0-9  
Blyleven 21 - 45   4 - 40   1 - 30   0 - 12   0 - 3
John 28   50   5 - 51   0 - 37   0 - 17   0 - 2
Jenkins 15 - 42   8 - 38   2 - 30   0 - 6   0 - 1
Kaat 27 - 51   6 - 51   2 - 32   0 - 7   0 - 1
Sutton 21 - 47   6 - 41   0 - 38   2 - 15   0 - 2
Carlton 11 - 44   4 - 48   0 - 25   0 - 12   0 - 4
Niekro 20 - 52   5 - 54   2 - 37   0 - 12   0 - 2
  143 - 331   38 - 323   7 - 229   2 - 81   0 - 15
  0.302       .105       .030       .024       .000    

None of the pitchers was able to win consistently with Game Scores under 50, and wins basically disappear at about 35.

But the data with Game Scores of 50-59 is all over the map:
  GS   50 -  59
Blyleven 43 - 71
John 66   34
Jenkins 34 - 49
Kaat 45 - 39
Sutton 47 - 64
Carlton 42 - 55
Niekro 56 - 47
  333 - 359

With Game Scores of 50 to 59, this group of pitchers was not able to win. Tommy John, with his fantastic double play rate, was able to win very consistently (.660 percentage) with Game Scores of 50 to 59, but overall, this group of pitchers was not.

That’s a failure for the system. I would have wanted a system in which pitchers with Game Scores of 50 to 59 were able to win 55 or 60% of their decisions, and I would have thought I had hit it closer than that. I didn’t get it. The things that are left out of the system are important enough to distort the data a little, at least on that point.

IX. Bullpen Support

There were a couple of issues I didn’t get into here, which were bullpen support and runs scored after the pitcher left the game. I didn’t get into these because
    a) I don’t have the technical sophistication to work with box scores that way, and
    b) I’m not sure how it should be done anyway.
On baseballprimer, Mike Emeigh wrote.
    I have checked the numbers. I looked at his run support on the basis of the game situation at the time of his departure, not a full game or per-nine-inning basis; IOW, if Blyleven got three runs of batting support through seven innings in which his team batted (he could have pitched anywhere from 6 innings to 7 2/3 innings in those games, depending on location and when he was removed from the game).
Well, but what if Blyleven pitched five innings and left with the score 2-1, then his team scored two more runs in the 7th inning, making the score 4-1. Are those runs scored in support of Blyleven, or aren’t they?

It seems to me that they are. As long Blyleven is the pitcher of record, runs scored by his team are scored in support of his won-lost record. I don’t see how carving those runs out of the study makes for a more accurate study, and in fact I would argue that it doesn’t.

Or consider this situation: Blyleven pitches six innings and leaves the game trailing 3-2. In the eighth inning his team scores 3 runs to take a 5-3 lead. Should those runs be counted in Blyleven’s account, or shouldn’t they?

Well, of course they should, because without them, Blyleven is charged with a defeat. Runs scored which have the effect of taking a loss away from the pitcher, it seems to me, obviously should not excluded from the consideration of his offensive support.

How do decide which runs to count in Blyelven’s support and which not? I don’t know. It’s a confusing issue, and I don’t have an answer. But here’s what I think about it.

First, the rules by which major league baseball credits wins and losses to individual pitchers are objectively silly, and there is no reason for a serious analyst to pay much attention to them in the process of figuring out what a pitcher’s "true" value has been. The more appropriate thing to do is to study the impact of Blyleven on the won-lost record of his team.

Second, Blyleven averaged 7.24 innings pitched per start throughout his career.

Unless we have real evidence that something weird is going on, like a colossal collapse of the bullpen or a bunch of runs being scored in a very few innings after Blyleven left, aren’t we better off assuming that the entire game represents a Blyleven game, rather than enter into a speculative and uncertain analysis based on questionable attribution of runs scored in support of Blyleven as an individual pitcher.

Third, Blyleven’s individual winning percentage was .534. The winning percentage of his teams in games that he started was .531—a lower figure than for any of the comparable pitchers except Niekro (also .531). What reason is there to believe that studying the innings specifically charged to Blyleven—even if we did know how to do that accurately—would give us a different answer than studying the entire games?

Emeigh concludes:
    Ergo, the difference between Blyleven and Hunter, Tiant, et. al. is almost entirely due to the fact that those other pitchers were better supported on a game-by-game basis.
I didn’t deal with Hunter and Tiant, but I don’t think that is quite true. I think that it is more the offense than the failure to match the effort needed, but it seems clear to me that there is also some actual failure to match on Blyleven’s part.

Another researcher, Eric Chalek, studied Blyleven’s bullpen support. A summary of Chalek’s research by some unidentifiable blogger is as follows:

(Chalek) found that there were 28 games in which Blyleven left the game losing only to be let off the hook as the offense came back for him. His record in those games was 1-0. Three times when he left a tie game the bullpen allowed inherited runners to score go ahead runs for the opposition. Blyleven ended up losing all those games. In another 47 games he left leading only to see the bullpen blow the lead. Blyleven got tagged for the loss in 3 of those games. Altogether his record was 1-6 in which the score changed hands after he left the game.

How un/lucky is that? Hard to say without a way to measure it. A record of 1-6 does sound rather unlucky though. Those games represent 11.4% of his starts. I have no idea if that's a high or low percentage for something like this because a sample size of one is tough to draw conclusions from. Also, in those 71 no-decisions, there's 17 more games where a lead was blown than gained. That's high but for a good pitcher it should be high (he should hand off more leads than deficits to the bullpen).

Sorting out 3 relief decision he had, and his complete games, it appears that Blyleven handed 164 leads over to his bullpen and 196 times he handed them a deficit. By blowing 47 leads, his bullpen preserved 71.3% of his leads, but they preserved 85.7% of the deficits (168/196 -- you get 168 by subtracting the 196 deficits mentioned here from the 28 rallies listed in the previous paragraph). Again, that sounds high.

I didn’t study Blyleven’s bullpen support, so I can’t comment on that directly. However, Chalek apparently didn’t study any comparable pitchers in this regard, so his data is without context, and maybe I can suggest a little context.

Blyleven made 685 career starts, and pitched 4,957 1/3 innings in those starts. If we assume that those games lasted nine innings on average—an assumption which carries a clear degree of risk—that would suggest that his bullpen may have pitched about 1,207 2/3 innings after he left the game.

Blyleven was charged with 2,021 runs allowed as a starting pitcher. The total runs scored by the opposition in games that he started were 2,547. Thus, Blyleven’s bullpens allowed 526 runs to score in approximately 1,207.2 innings, a number that could be seriously in error. But, as best I can estimate, Blyleven’s bullpens allowed about 3.91 runs per nine innings.

Blyleven was 286-248 as a starting pitcher; his teams were 364-321 in those games. His bullpen, then, was 78 and 73.

These number appears do not appear to be remarkably different from the bullpen numbers of the comparable pitchers:
Pitcher Innings Runs R Avg Wins Losses
Blyleven 1207.67 526 3.92 78 73
Carlton 1215.67 502 3.72 76 65
Jenkins 987.33 403 3.67 59 38
John 1677.67 725 3.89 102 85
Kaat 1487.33 589 3.56 85 60
Niekro 1294.33 641 4.46 77 73
Sutton 1555.67 634 3.67 98 84
Average 1346.52 574 3.84 79.5 65.7

Blyleven’s bullpens had fewer wins and more losses and a higher estimated runs allowed average than the group norms, but not remarkably so. His bullpen support was probably a little bit on the weak side.

X. One-Run Games

One-run games have an obvious bearing on this study. Remember the what-if example I gave in part one of this admittedly too long study?

One, whom we will call Ferguson Winner, loses a game 6-0, but wins six others 1-0, 2-1, 3-2, 4-3, 5-4 and 6-5. The other, whom we will call Bert Loser, wins a game 6-0, but loses six others by the same scores (1-0, 2-1, etc.)

In that example, Ferguson Winner has a record of 6-0 in one-run games, while Bert Loser has a record of 0-6 in one-run games. It bears on the issue, because if there were a pitcher who had an inability to win the close, low-scoring games, obviously that should be reflected in his record in one-run games.

Blyleven has the poorest record in one-run games of any pitcher in this study. The chart below gives:
    a) the pitcher’s individual wins in one-run games,
    b) his losses,
    c) his winning percentage,
    d) his team’s wins,
    e) his team’s losses.
Pitcher A B C D E
Carlton 68 58 .540 113 104
Sutton 69 63 .523 138 121
Kaat 64 60 .516 118 103
John 63 63 .500 124 120
Niekro 62 64 .492 115 107
Jenkins 63 71 .470 103 95
Blyleven 56 75 .427 113 122

XI. Conclusion

Although Blyleven’s critics have made too much of his disappointing won-lost record, there is something there. Blyleven did not do an A+ job of matching his effort to the runs that he had to work with.

However, this probably should not be keeping him out of the Hall of Fame. Blyleven was 344 runs better than an average pitcher. The largest penalty that we could reasonably charge him for failing to match his best games with the games that he had a chance to win would be about 83 runs. That leaves him 261 runs better-than-league.

There are ten pitchers in history who are 260 to 280 runs better than league: Bob Feller, Eddie Plank, Ferguson Jenkins, Jack Stivetts, Ed Walsh, Clark Griffith, Rube Waddell, Old Hoss Radbourn, Juan Marichal and Dazzy Vance. All except Stivetts are in the Hall of Fame.

Also, look at it this way. Suppose that Blyleven has a seven-game stretch during which he wins games 13-0 and 5-2, but then loses 3-2, 4-3, 3-2, 7-4 and 3-2. Those are the actual scores of Blyleven’s games from May 3 to June 4, 1977. Blyleven was supported by 4.43 runs per game during that stretch and allowed 3.14, but he lost five of the seven games.

One can look at that and say that Blyleven failed to match his efforts to the runs he had to work with—but why is that all Blyleven’s fault? Isn’t it equally true that his offense failed to match their efforts to Bert’s better games? It seems to me that it is.

So why do we hold Blyleven wholly responsible for this? Wouldn’t it be equally logical, at least, to say that this was half Blyleven’s fault, and half his team’s fault?

Bill James
Ft. Myers, Florida
March 28, 2007

COMMENTS (5 Comments, most recent shown first)

This is a test

Please ignore this msg

10:39 AM Jul 23rd
This is a test
Please ignore this msg

10:38 AM Jul 23rd
This is a test.

Please ignore this message
9:10 AM Jul 23rd
One factor that might add some insight could be "tough no-decisions." Based on 4970IP (divided by 9), Blyleven "should" have had 552 career decisions, but only recorded 537. The other pitchers had a cumulative spot-on 3294 decisions in 29,644IP.

I remember thinking the same thing (in the opposite direction) reading in the Abstract about Lamar Hoyt's '83 Cy Young when he went 24-10 in 261IP (=29 expected decisions). BillJ wrote that he was taken out early with big leads often.

Theory: Starting pitchers "deserve" one decision per 9IP, and all their losses. "Too many" decisions implies a use pattern like Hoyt in '83; "too few" suggests pitching well but having to leave with a 1-1 tie. This shouldn't be too hard to study with the right database: Create a Pythagorean W-L% from offensive support and runs allowed. Multiply by expected decisions (IP/9). Compare to actual W-L. Are most "extra" decisions wins, losses or both equally?

Of course, this wouldn't affect the opinion of people who will point out that Blyleven is deficient W's because he wasn't winning the games.
1:12 PM Mar 30th
I like the method used in this article to estimate the components of these pitchers' level of runs saved vs. W-L discrepancy and how much is attributable to overall run support vs. ability to matchup with the opponent. It would be interesting to see a broader study on pitchers. Jerry Koosman and Robin Roberts certainly come to mind as pitchers who might have Blyleven-like characteristics. Then there are those who are known for the opposite characteristics of Blyleven, i.e. those are believed to be particularly strong at matching the opponent. This could be first measured in winning percentages relative to their runs saved or ERAs, then any who were high on this measure could be analyzed vs. a peer group. I am thinking specifically about Jack Morris, Dave McNally (known in my youth as "Dave McLucky", Andy Pettitte, Don Drysdale... How much of their success was due to run support and superior defensive play and how much to their ability?
11:35 AM Mar 20th
©2022 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy