Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The Early Cy Young Seasons

By Bill James

February 20, 2019

By coincidence, today’s installment of the pitcher analysis begins with the 1956 Cy Young Award, which was won by Don Newcombe, who passed away yesterday. In this analysis I concluded that Newcombe DID in fact deserve the 1956 Cy Young Award. This was a surprise to me, as in previous attempts to study the issue I had believed that he did not. The comments about Newcombe were not edited in any way as a consequence of his death. It is my opinion at this time that he was the best pitcher in the majors in 1956.

Later on here, as I warned you that I would, I am going to repeat some segments which were printed earlier, which have to do with Jack Kralick and the 1961 pitchers. These articles were written to appear here, as they do today. Earlier I took them out of sequence and published them separately, but I am repeating them today so that you have the opportunity to read them in context if you want to do that.

1956 AL—Herb Score (D-WAR) vs. Early Wynn (R-WAR)

Score and Wynn were both 20-9 for the same team (Cleveland) and with almost the same ERA (2.63 for Score, 2.72 for Wynn). Whatever the difference was between them, it’s probably not worth arguing about.

1956 NL—Don Newcombe (D-WAR) vs. Johnny Antonelli (R-WAR)

Cy Young Winner: Newk

Newcombe—I would assume most of you know this but you never know—Newcombe was 27-7 and won not only the first Cy Young Award, but also the MVP Award. All modern methods of analysis would agree that he was probably not the Most Valuable Player, or even an especially good Cy Young selection, and Baseball Reference does not list him among the ten best players in the league, which seems odd. Johnny Antonelli won 20 games (20-13) and has a better ERA than Newcombe, although almost the same ERA+ (132-131). I would go with Newcombe based on his consistently pitching well.

1957 AL—Jim Bunning (D-WAR) vs. Frank Sullivan (R-WAR)

Bunning started the 1957 season in the bullpen, a young, unproven pitcher, but pitched well out of the bullpen, moved into the starting rotation in late May and wound up 20-8 with a 2.69 ERA, also leading the league in innings pitched, which would seem to be difficult to do when you spend six weeks as a reliever. It’s actually a very similar season to Ernie Broglio in 1960, somewhat similar to Bob Friend in 1955. It was the only time in Bunning’s great career that he won 20 games, although he went 19-8, 19-9, 19-10 and 19-14 in various seasons.

Frank Sullivan was 6-foot-7 inch pitcher who was once traded for Gene Conley, who was 6-foot-8, in what was called the tallest trade ever made. Sullivan later joked that he was in the twilight years of a mediocre career, but he was selling himself short, no pun intended. Sullivan was in the top 10 in WAR by a pitcher in 1954-55-56-57. In 1957 Sullivan was 14-11 with a 2.73 ERA. Baseball Reference WAR sees Sullivan as the best pitcher in the league, leading Jim Bunning 6.4 to 6.3. My new method (D-WAR) also has them 1-2, but has Bunning a little further ahead ahead (8.2-7.5). Bunning pitched significantly more innings with a better ERA, significantly more strikeouts, dramatically better hits/9 innings, better wins and losses for whatever that is worth. I can see the argument for Sullivan but I would much rather have Bunning.

1957 NL—Don Drysdale (both systems)

Cy Young Award went to Warren Spahn.

1958 AL—Bob Turley (D-WAR) vs. Frank Lary (R-WAR)

This one is extremely interesting in my opinion. I have long believed that the selection of Bob Turley as the Cy Young Award winner in 1958 was a poor if not idiotic selection. This comparison calls that into question.

There is no missing data here; I have all of the starts for each pitcher. D-WAR, to my surprise, shows Cy Young Award winner Turley as the best pitcher in the league, edging his teammate Whitey Ford 7.9 to 7.8, with Frank Lary fourth at 6.1. R-WAR, on the other hand, has Lary first at 6.7, with Turley a distant 10^th at 3.6. There is about a 49-run disparity between the conclusions of the two approaches. Lary pitched five times in relief but pitched poorly in relief, so that doesn’t explain the discrepancy or contribute to the explanation.

So it is a conflict which, as the lawyers say, is directly on point. It doesn’t result from missing data; it doesn’t result from relief work, and it doesn’t have anything to do with won-lost records. There is simply something about the way the two systems are analyzing the data which leads to radically different conclusions. So what is it?

Another thing that it isn’t is un-earned runs. Turley was charged with only 1 un-earned run; Lary, with 16. This would favor Turley in the R-WAR type of analysis. The one that he lost.

OK, so what is it? Why does the R-WAR system favor Lary, rather than Turley?

As nearly as I can figure, it comes down to four things.

First, the R-WAR system believes that the defensive support for Turley was much better than for Lary. The D-WAR system just doesn’t worry about it. The R-WAR system could be right, I don’t know, but it’s only about a 4-run contribution to a discrepancy of almost 50 runs.

Second, Turley has a quite exceptional rate of hits allowed for that era, allowing 178 hits in 245 innings, while Lary allowed 249 hits in 260 innings. The D-WAR system gives him some credit for this, and this gives him some advantage in the D-WAR approach. The R-WAR approach, as I understand it, pays no attention to that fact.

Turley also walked 128 batters compared to Lary’s 68, so that would cancel out most of the difference in hits. Still, some little advantage there for Turley in the D-WAR system.

Third, as I understand it, the R-WAR approach makes park adjustments based on multi-year data, whereas the D-WAR approach makes Game Score adjustments based on data from that year only. This works, in this case, pretty strongly in Lary’s favor. Tiger Stadium (Lary) has a multi-year park factor of 107, a single-year factor of 102. Yankee Stadium (Turley) has a multi-year factor of 94, but a single-year factor of 99.

In other words, there is very little difference in the park factors for the two pitchers, based on the 1958 data alone (102 to 99). But there is a very significant difference based on the multi-year data (107 to 94.) Applying the multi-year park factor makes Lary look better, and Turley look worse.

As to whether this is right, wrong or otherwise, I have no idea. Both the one-year and the multi-year approaches raise serious issues. Neither is the right approach. We just do the best we can.

The fourth factor, and I think the largest factor, has to do with consistency. A study of the pitcher’s game-by-game results shows clearly that Turley had a consistency advantage, of a type that would contribute to more expected wins for his team.

In 31 starts, Turley met his target score for the game 24 times, and missed it only 7. Lary met his target score 20 times, but missed it 14 times. This difference is missed, in the aggregate, because Turley had two really terrible games. On September 23 in Fenway Park, Turley gave up 9 hits and 7 runs, all earned, in 3 innings. It was his last start of the season; the Yankees had clinched the pennant several days before, and had long since virtually wrapped up the pennant, so. . .questionable whether that game should be counted at all, since it had no impact on the pennant race. It does count, in my analysis, but it’s just one game that he has no chance of getting his team a win.

Turley was -38 vs. the Target Score in that game, and he had another game in which he was -30. Lary, on the other hand, had NO games in which he was worse than -24, and only two games in which he was worse than -16. But he had SEVEN games in which he was -14 to -17. He had a lot of games, in other words, in which he was just kind of pukey. In helping your team win, a lot of pretty bad games are worse than a few really bad games. Lary stuck his team with a bad outing (-14 or worse) 9 times; Turley, 3 times.

So who deserved more the recognition as the league’s number one pitcher? Well, certainly Turley has a much stronger case than I realized that he did before doing this research. I think, knowing what I know at this moment, that I would probably vote for Turley. Turley, not based on his offensive support, but just based on how well he pitched in each game vs. the expectations for that game, had a Deserved Won-Lost record of 17-8. Lary’s Deserved Won-Lost Record was 15-11.

But there is this. Lary’s nickname was "The Yankee Killer". 1958 was one of the main reasons he got that nickname. He was 7-1 against the Yankees in 1958, with a 1.86 ERA. But he was 9-14 against the other six teams.

1958 NL—Sam Jones (D-WAR) vs. Robin Roberts (R-WAR)

Prejudice against black players did not evaporate immediately after Jackie Robinson, Larry Doby and Branch Rickey broke the color line. There are also players, like Minnie Minoso and George Crowe, who got a chance to play after 1950, but not a full chance, not the same chance they would have gotten had they been white.

Sam Jones was one of those. A big man who pitched with a toothpick in his mouth and the best curveball in the National League, Jones was major league ready by 1950, when he was 24 years old, but didn’t get a real chance to pitch until 1955, when he was 29. He led the National League in strikeouts in his first year, also in walks and losses. He was the #1 pitcher on the Cubs, a tail-end team. In 1958, pitching for St. Louis, he led the majors in strikeouts, with 225, and had a 2.88 ERA in 250 innings. He led the National League in strikeouts by 75 (225-150), and had the best ERA of any starting pitcher. Stu Miller, with 20 starts and 21 relief appearances, got enough innings to take the ERA title (2.47), but Jones had the best ERA of any starter.

Robin Roberts by 1958 was not the Beast that he had been from 1950 to 1955. Now pitching for a bad team, he had led the National League in losses in 1956 (19-18) and 1957 (10-22). In 1958 he had basically the same peripheral stats as in 1957, but cut his home runs allowed from 40 to 30, pitched 270 innings and finished 17-14 with a 3.24 ERA.

Sam Jones was a year OLDER than Robin Roberts, which will seem odd to anybody who remembers the era because Roberts was the best pitcher in baseball from 1950 to 1955 and Jones was basically a rookie in 1955, but Jones was actually older. He lost many or most of his prime seasons, perhaps in part to wildness, but more to prejudice. Baseball Reference has Roberts and Jones both at 6.3 WAR in 1958, basically tied for the lead although they recognize Roberts as first. My system has Jones significantly ahead (8.8 to 6.6), with Warren Spahn sandwiched between them.

You know what else these two men have in common? Modern media figures, popular on TV, who share their name. You can’t just google either one of them by his name, or you get the other Robin Roberts or one of the other Sam Joneses.

1959 AL—Camilo Pascual (both systems)

Early Wynn won the Cy Young Award, but Pascual was pretty clearly better. Wynn had a deserved Won-Lost record of 17-12; Pascual, of 16-7.

1959 NL—Don Drysdale (D-WAR) vs. Larry Jackson (R-WAR)

Three NL pitchers won 20 games in 1959, and all three of them had the same won-lost record: 21 and 15. 21-15 is not a normal Cy Young type record. Braves manager Fred Haney worked his two aces, Spahn and Burdette, into the ground; Sam Jones was the third 21-15 pitcher.

Drysdale, course, is a Hall of Famer and was the #1 pitcher on the World Championship team. 1959 was far from his best year; he was 17-13 with a 3.46 ERA and a major-league leading 242 strikeouts, working in the Los Angeles Coliseum, which gave up cheap home runs.

Larry Jackson seems to be. . .well, not as weak a selection as Jack Kralick in ’61 or Thornton Lee in 1938, but along that line. He pitched 256 innings—less than Drysdale, Spahn, Burdette or Sam Jones—with a 3.30 ERA, 14-13 record. Baseball Reference for some reason credits him with 7.3 WAR, and puts Drysdale third, with 5.9; Vern Law was between them. My system almost reverses those numbers, giving Drysdale 7.4 WAR and Larry Jackson 6.1.

My analysis puts Jackson 6^th, behind Drysdale, Spahn, Law, Sam Jones and Johnny Antonelli. I have Drysdale as deserving of a 17-11 record, Jackson 16-12. There’s not a huge difference between them, but I do believe that Drysdale was better.

1960 AL—Jim Bunning (both systems)

Bunning had a classic low-run support season, leading the league in strikeouts (201) and having exceptional numbers across the board, except for the won-lost record. He pitched 252 innings, 2.79 ERA, 201-64 strikeout/walk ratio and only 20 home runs allowed. Poor offensive support stuck him with an 11-14 won-lost record.

1960 NL—Don Drysdale (D-WAR) vs. Ernie Broglio (R-WAR)

Cy Young Award—Vern Law

Drysdale, like Bunning, had terrific numbers in everything except wins and losses, pitching 269 innings with a 2.84 ERA, 246 strikeouts and only 72 walks. He finished just 15-14.

Broglio, working out of the bullpen until July 4^th, made 24 starts, 28 relief appearances, consequently cannot be accurately evaluated by my method. Baseball Reference ranks Broglio ahead of Drysdale, 7.2 to 7.1, with the Cy Young Award Winner, Vern Law, in 9^th place at 4.2.

The Luckiest Pitchers of All Time (Career)

A side benefit of this process is that it enables us to measure in objective terms exactly how God Damned Lucky a pitcher was, in a season or in a career, and thus allows us to produce lists of the luckiest and unluckiest pitchers ever, in a season or a career.

The luckiest pitcher of all time is not a particularly interesting selection; it’s Kirk Rueter, who came up with Montreal, and pitched for San Francisco in the Barry Bonds era. These are the top ten:

First	Last	Actual W	Actual L	D-Win	D-Loss	D-WPct	D-Luck
Kirk	Rueter	129	92	108	135	.444	64
Whitey	Ford	226	99	199	133	.599	61
Andy	Pettitte	255	152	219	173	.559	57
Tom	Glavine	305	203	279	232	.546	55
Herb	Pennock	149	87	106	95	.527	51
Freddie	Fitzsimmons	180	122	145	138	.513	51
Tommy	John	284	228	264	258	.506	49
Allie	Reynolds	141	76	110	93	.540	49
Mike	Torrez	183	157	158	180	.467	48
Lew	Burdette	159	126	130	143	.476	46

A list entirely of pitchers who pitched most of their careers for outstanding teams, for obvious reasons, but let’s focus on Whitey Ford. Ford in his career was 227-99 as a starter, so I apparently am missing one start from him, and then he was 9-7 as a reliever. He was a great pitcher, with a Deserved Won-Lost record of 199-133, but he picked up about 30 games over that level because he pitched for pennant-winning teams in 11 of the 14 seasons that he was in the rotation.

On Un-Earned Runs

Early in my career, I questioned whether the concept of an "error" was useful (as opposed to useless) and whether it was legitimate (as opposed to arbitrary.) To say that I "questioned" the concept is being unnecessarily kind to my younger self. I attacked the concept. The concept of an error, I argued, made an arbitrary distinction between events which were otherwise much the same. Two balls are hit to center field. The center fielder doesn’t field either one. In both cases the batter winds up at second base—but one of these is recorded as an error, and the other is recorded as a double. Why is that?

It is because a third party, not connected to the game in any necessary manner, has made a judgment about what SHOULD HAVE happened in one case—the center fielder should have caught that ball—and has entered his judgment into the record. This is not what records are supposed to be. The statistical record is supposed to be an account of what has happened, not a record of subjective judgments about what SHOULD have happened if everybody had played better.

As far as I know I was the first person to make that argument—maybe I was or wasn’t, who knows—but in any case it has entered the marketplace of ideas, and a substantial number of people now see the issue that way. But in retrospect. . . .welllll.

The principle that the record is supposed to be an objective record of what has happened, without the extraneous input of value judgments, still seems to me to be a valid principle. It is, however, not the ONLY valid principle that instructs the business of record-keeping. Another valid principle is that we want the record books to include as much information as we can get. Another valid principle is that unlike events should not be recorded as if they were the same.

Suppose that there are three ground balls to the shortstop side of third base, hit on the same line with the same velocity and the same number of hops, etc. In one case, the third baseman picks up the ball, throws to first, and the out is recorded. In the second case, the third baseman picks up the ball and throws it to where the first baseman could never possibly catch it. In the third case, the third baseman is in a shift, and there is no possible way he could make a play on the ball.

The elimination of the concept of the error (a) eliminates information from the record book which would be useful to us if we knew it, and (b) causes unlike events to be recorded as if they were the same. Also, in making the opposing argument, I argued that it was generally and usually difficult to say whether something was a hit or an error. I don’t know, in retrospect, that that was absolutely true. It is often true, a point that has been brilliantly illustrated by Brian Kenny with his "Hit, or Error" videos, but I also think that I over-stated the point years ago.

I am not saying that the younger Bill James was entirely wrong, or that you should not pay attention to him. I was making a useful point, I think. I am saying that you shouldn’t take the point too literally.

Jack Kralick in 1961 was 13-11 with a 3.61 ERA, but is identified by Baseball Reference as the best pitcher in the American League that season. This is part of what is going on with the Jack Kralick/1961 problem, one of the largest issues. If you obliterate the reliance on errors then un-earned runs are the same as earned runs. The design of R-WAR takes that literally, and treats earned runs the same as un-earned runs. Kralick allowed a very low number of un-earned runs. We are used to evaluating pitchers in large part based on the Earned Run Average, but Kralick looks much, much better when the un-earned runs are treated the same as the earned runs. There are 40 American League pitchers who qualified for the ERA championship in 1961. In ERA, Kralick ranks 18^th among the 40. He is near the center of the chart. If you rank them by TOTAL runs allowed per nine innings, however, Kralick leapfrogs Jim Bunning, Bennie Daniels, Don Schwall, Frank Lary, Rollie Sheldon, Bill Monboquette, Camilo Pascual and Jim Archer, and thus moves up to 10^th among the 40 pitchers. This is one of the largest reasons for the Kralick anomaly.

There are six reasons for the Kralick anomaly:

1) That there is no pitcher in 1961 who is clearly better than the others, and there is no little group of three or four who can be easily distinguished from the others. This creates a cluster of pitchers who rate as being close to one another. When there is a cluster of players who are about the same, that makes small adjustments relatively larger.

2) The fact that the R-WAR system treats unearned runs the same as earned runs, which is not natural to us.

3) The fact that the system makes park adjustments, which most of us still do not do most of the time in making a casual assessment of a player. The intensity of emotion supporting Jacob deGrom in 2018 occurred in substantial part because people don’t make park adjustments in their heads, and thus overrated the extent to which deGrom was dominating the league. (I agree that deGrom probably was the best pitcher in the league, but when you factor in the park adjustment, it’s actually a razor-thin margin, rather than a wide margin. But most people don’t know that, because most people don’t make park adjustments in their heads as a routine matter.)

Kralick benefits from the park adjustment, and this is absolutely legitimate. But whereas Kralick worked in a hitter’s park, all nine of those who were ahead of him in Runs Allowed per 9 innings worked in pitcher’s parks. Eight of the nine worked in parks with Park Factors in the 80s. Thus, when you park-adjust their RUNS allowed per nine innings, Kralick vaults from 10^th in the league to 4^th.

4) Park Factors can be measured either as one-year or multi-year elements, neither of which exactly works, and either of which causes problems. Baseball Reference, I am informed, uses a multi-year approach.

5) We are, of course, not paying attention to his won-lost record, which most of us still look at and still place some weight on in evaluating pitchers.

6) The system is assuming that Kralick suffers from bad defense, and that this is costing him .09 runs per nine innings, or two and a half runs on the season (the category shown as RA9def in the Baseball Reference charts.) When you make THAT adjustment, because the differences among these pitchers are small, Kralick passes two of the three pitchers still ahead of him on the Runs Allowed per 9 innings Park Adjusted chart, moving him into second place. Since Kralick pitched far more innings than the other pitchers with comparable RA9 park-adjusted, he shows as the most valuable pitcher in the league.

But the RA9def adjustment is highly suspect. I am not saying that pitchers don’t benefit or suffer from good or bad defense behind them, or that we should not attempt to adjust for that. I am not saying that the Baseball Reference adjustments are always wrong or generally wrong, or even that two and a half runs is not a reasonable estimate in the case of Jack Kralick. What I am saying is that it is EXTREMELY difficult to know what the actual impact of the defense on the pitcher has been, even now when we have tons of data to work with, and it is highly speculative in regard to any one particular pitcher when working with the much more limited data of 1961.

What I said before was that there is a very narrow pathway that leads to the conclusion that Jack Kralick was the best pitcher in the American League in 1961—or the conclusion that Don Cardwell was the best pitcher in the National League and the best pitcher in baseball. There is ONE pathway through the numbers that leads to that conclusion. To reach that point, you have to believe:

1) That won-lost records shouldn’t count,

2) That earned runs should be counted the same as un-earned runs,

3) That runs allowed averages must be park-adjusted,

4) That the Minnesota Park Effect in 1961 was actually +18%, rather than that it merely happened that there were some high-scoring games in Minnesota in 1961, and

5) That despite the fact that he was charged with only four un-earned runs, Kralick suffered from very poor fielding support.

My point is not that any of those assumptions is wrong. It is that all of them are questionable at one level or another, and that all five of them are necessary to reach the conclusion that Kralick is the best pitcher in the league. The one pathway that happens to lead to that conclusion is the one pathway chosen by the R-WAR system.

And that is just what happens when you design rating systems. You HAVE to make choices that are not absolutely 100% right 100% of the time. You choose the set of assumptions that you think are 51% right in one case and 55% right in another case and 80% right in one case and 99% right in another case. And sometimes the sum total of those assumptions is great, and sometimes it is good, and one time in a hundred the sum total of those assumptions is going to lead you out in the forest and drop you off on Jack Kralick’s doorstep.

Back to 1961

So back to the 1961 season, which we talked about at some length earlier in this series of articles. Who was the best pitcher in the majors in 1961?

On the level of "Cumulative Margin", the number one pitcher in the majors in 1961 would be Camilo Pascual of the Minnesota Twins. I know I have reached THAT conclusion before, using a methodology very similar to this one. If you rummage through the articles on this site, you can find an article in which I made many of the same adjustments that I made here, and concluded that Pascual was the best pitcher in baseball in 1961.

I’m not going to reach that conclusion this time; we’re not at the finish line yet. But let’s take on this question. Jack Kralick and Camilo Pascual were teammates in 1961, with similar records. They made 33 starts each, gave up 97 earned runs each. Pascual was 15-16 with a 3.46 ERA; Kralick was 13-11 with a 3.61 ERA, although, as we know, Kralick allowed fewer TOTAL runs, just a higher percentage of them were earned as opposed to un-earned. So the question is, if it is not credible to argue that Kralick was the best pitcher in the league, but merely has a record that basically disguises this because of park effects and run support. . .sorry. The question is, if it is not credible to argue that Kralick is the best pitcher in the league, why is it credible to argue that Pascual is the best pitcher in the league?

Two reasons. First, Pascual was probably the best pitcher in the American League in 1959, was a highly effective, top-3 American League pitcher in 1960 although he missed starts with injury, won 20 games in 1962 and was 21-9 with a 2.46 ERA in 1963. He led the American League in WAR for pitchers in both 1959 and 1963. In 1959 he led the American League in WAR, period, finishing ahead of the ERA leader (Hoyt Wilhelm), ahead of Mickey Mantle (third) and the MVP, Nellie Fox (fourth). He was a great pitcher in 1959, 1960, 1962 and 1963; it is not such a stretch to argue that he was the best pitcher in the league in 1961, but merely happened to have a combination of park and run support which made it look like he wasn’t.

Second, Pascual led the American League in strikeouts in 1961, and gave up 205 hits in 252 innings, whereas Kralick had a nothing strikeout rate and gave up 257 hits in 242 innings. Most people realize now that strikeouts are not Christmas decorations; they actually matter.

I don’t have any doubt that Pascual was a better pitcher than Jack Kralick in 1961, as he was in every other season. Pascual threw eight shutouts in 1961—eight. That’s a big number. Even back when complete games were common, a lot of people would lead the league in shutouts with four. But it’s a very good question. If Kralick isn’t a credible candidate for the best pitcher in the league, is Pascual?

Consistency

The impact of pitching well on getting a win, or on your team’s getting a win, is not a straight-line impact. To begin with, one might assume that if you meet the Target Score for a game—that is, if you pitch as well as an average pitcher might be expected to pitch in the same circumstances—that you should have, and your team should have, an expected winning percentage of about .500. This is not quite true. If you exactly meet the Target Score for the game, your expected winning percentage for the game is .450. Your team’s expected winning percentage is higher, up a little short of .500, but if you just meet expectations, don’t exceed them, there’s a good chance that the bullpen will get the decision.

To get an expected winning percentage of .500, you actually have to exceed the Target Score by 1.43 points. If you exceed the Target Score by 5.00 points, you have an expected winning percentage, as a starting pitcher, of .572. Those five points in your Game Score—basically one clean inning—increase your expected winning percentage by 122 points.

But do EVERY five points of Game Score improvement increase your winning percentage by 122 points? Obviously they can’t. That would mean that, if your Game Score was 25 points above the Target Score, your expected winning percentage would be greater than 1.000, which is of course impossible.

I am making a point here which many, many people have made before in other forms, and which most of you, I would guess, already know to be true: That pitching well in a game has the greatest impact when the expectation is near .500. The Target Score for a game is almost always near 50. If the Target Score is 50, the difference in the pitcher’s expected winning percentage between hitting 55 and 50 (Game Score) is enormous. The difference between 95 and 90 is tiny, tiny, tiny. If your game score is 70, 75, 80, you’re probably going to win the game anyway. At a Game Score of 70, a pitcher’s expected winning percentage is about .825. There’s not that much room for it to go higher.

The point is this is that for a starting pitcher, consistently good performance is more efficient than occasionally brilliant performance of the same overall quality. This is not a universal truth in regard to brilliance versus consistency. For a Hall of Fame candidate, for example, the opposite is true: brilliant seasons count more—and should count more—than good seasons adding up to the same total. The reason this is true is that BIG seasons, in a career, have more impact on your team’s pennant chances. But for a starting pitcher, consistency is better than inconsistent dominance.

A few articles back in this series, when I was listing the positives and negatives about this line of analysis, I listed "consistency" as one of the benefits of this line of analysis. This article is trying to explain why. If you evaluate a pitcher’s season by his season totals, you miss the issue of consistency. A pitcher has 200 innings with an ERA of 3.60; you don’t know whether he has been consistent or not—but it actually does make a difference. That’s why this is a benefit to this approach, missing from many other approaches.

This also has to do with Camilo Pascual in 1961. Pascual was the best pitcher in baseball in 1961, if you just compare his Game Scores to his Target Scores, and total up the margin. But remember what I said: Pascual threw eight shutouts in 1961.

Pascual in 1961 pitched a lot of brilliant games. Pascual in 1961 had 12 games in his which his Game Score was +25 versus the Target Score, and 7 games in which he was +33 or more. But he also had 12 starts in 1961 in which his Game Score was below the Target. That’s a little high, for a pitcher of that quality. He had three starts in which he was 23 points or more below the Target Score.

Pascual was not the best pitcher in baseball in 1961 because he was inconsistent. This is not a general observation about Pascual. He was not inconsistent in 1959 or in 1963; in those seasons he had normal numbers of dominant games and off games. But in 1961, he had a measurable and meaningful inconsistency in his game-to-game performance.

1961 AL—Whitey Ford (D-WAR) vs, Jack Kralick (R-WAR)

Cy Young Award—Whitey Ford

To be honest, I started this series of articles mostly because of my curiosity about the 1961 Cy Young race. In 1961 it was a consensus opinion that Whitey Ford was the best pitcher in the league. He won 25 games and lost only 4—the best won-lost record in the majors not only that year, but the best in at least five years. We all see clearly now the other side of the argument. Ford "won" games not because he individually was brilliant, but because the Yankees scored 5.67 runs per game for him, and Luis Arroyo picked up the pieces any time he got into trouble. Ford "won" 11 straight starts between June 2 and July 8, not just 11 straight decisions but 11 straight starts, winning 11 games in just over a month. He ran his record to 20-2 before losing a couple of games toward the end. It was regarded at the time as a historic season.

One thing you probably don’t know about the season is that Ford had 10 starts with no decision, and the Yankees won 9 of the 10. They were 34-5 in Ford’s starts—a more impressive stat, certainly, than his 25-4 mark as an individual pitcher. The Yankees were 109-53, and over half of their margin (wins minus losses) was in the games that Ford started. They were +29 with Ford on the mound, +27 with other starters.

Ford as individual could easily have been 27-4. On September 1 against the Tigers, Ford was pitching a shutout, but was taken out of the game after 4 2/3 innings with a mild hip strain. And on September 29^th, Ford’s last start of the season, Ford was dominating the Red Sox, had thrown six shutout innings with 9 strikeouts, but came out of the game to save himself for the World Series. Arroyo gave up a run, so the win went to the bullpen.

One can be both lucky and good. I understand now that it wasn’t truly a historic season; it merely appeared to be so at the time, but if Ford didn’t deserve the Cy Young Award, who did? Jack Kralick seems like a completely unsupportable candidate, to me.

My conclusion is that Ford did in fact deserve the Cy Young Award; he was in fact the best pitcher in the league, and the best pitcher in the major leagues. What the other systems are missing mostly, I think, is the consistency with which Ford put him team in a position to win. I credit Ford with a deserved won-lost record of 19-12, and 8.1 WAR. The top 5 by my math: Ford, Jim Bunning, Camilo Pascual, Frank Lary and Steve Barber. I would have Kralick as the league’s 11^th best starting pitcher.

Kralick and Curt Simmons

In the National League in 1961 there was a pitcher, Curt Simmons, who is in many respects very similar to Jack Kralick, and in other respects exactly the opposite of him.

Simmons had a career winning percentage of .513 and a career ERA of 3.54. Kralick had a career winning percentage of .521 and a career ERA of 3.55. Simmons, one of the first Bonus Babies, had had a long career. He had come to the majors at the age of 18 in 1947. He had perhaps the best fastball in baseball as a young pitcher, but after a few good seasons he had suffered for years with not very good Philadelphia Philly teams, had worn out, gone to the minors to get re-started, and had wound up in St. Louis. Kralick was a young pitcher with no past to speak of, but in 1961 he was exactly the same thing that Simmons was: a left-hander with an average/below average fastball who could win a little more than half of his games because he got ground balls and didn’t walk people and he knew how to pitch.

And, as Kralick is seen by one pathway through the numbers as the best pitcher in his league, so too is Simmons. The National League ERA was almost exactly the same as the American League ERA, 4.03 and 4.02, and Simmons also was working in a hitter’s park. The Busch Stadium Park Factor in 1961 was a whopping 130, the highest in baseball although, as was true with Minnesota, it wasn’t actually that much of a hitter’s park, you just don’t get a true read with one season’s data.

Simmons had a 3.13 ERA in 30 games, 196 innings. With a 4.03 league ERA and a Park Factor of 130, a 3.13 ERA is really good. His ERA+ was 141, which was the best in the National League. Simmons was at that level quite a bit; he had two seasons in which his ERA+ was 144. His ERA+ was third in the league in 1960 and 1963, second in the league in 1954, and among the top ten in the league eight times. He pitched almost all of his career in hitter’s parks, but still managed to post consistently better-than-league ERAs.

The 1961 St. Louis Cardinals, for whom Simmons labored, had had trouble at shortstop since Marty Marion got hurt more than ten years earlier. The situation had gone steadily downhill, as it will if you don’t decide what you want to do at a position and commit yourself to it, and by 1961 it had reached the point of being laughable. The Cardinals kept switching between shortstops all year. For the season, Alex Grammas played 351 innings at shortstop for the Cardinals; he is listed in many sources as the Cardinals regular shortstop because he played in more games at shortstop than anyone else, but half of those games were late-inning defensive replacements, and when he started the game, he often came out for a pinch hitter. He was 35 years old, all year, and he had been a pretty good defensive shortstop, but you know, he was 35 years old, and had spent several years as a utility infielder. He had been traded from Cincinnati to St. Louis and then back to Cincinnati and then back to St. Louis. He got 351 innings at shortstop, Bob Lillis had 384, Daryl Spencer had 317, Jerry Buchek got 230, and Julio Gotay got 88.

They were all terrible; they were all second basemen trying to play shortstop except Gotay, who was a legitimate shortstop except ridiculously error prone. Among the five of them they made 53 errors. Gotay fielded .804 at shortstop, Buchek .912, Lillis .928, Spencer .956 and Grammas .960. The 53 errors was 19 more than any other team in the National League—at shortstop or any other position. It’s a LOT of errors, and also, Cardinal second basemen made 27 errors, which was second in the league at the position.

They particularly liked to make errors when Simmons was on the mound. It’s not surprising; left-handed ground ball pitcher, you’re going to get GB6 when he is on the mound. Simmons was a really good hitter; not that that is relevant but it is part of the story. He had hit over .200 ever year in 1956, 1957, 1958, 1959 and 1960, and in 1961 he hit .303 in 77 plate appearances with 8 walks giving him a .378 on base percentage. He was probably the best hitting pitcher in baseball in 1961, I don’t know.

In San Francisco on May 31, Simmons was leading 2-0 going into the bottom of the sixth. Ground ball to short to lead off the inning, shortstop (Bob Lillis) made an error to get the inning rolling. He got one out but then walked Joey Amalfitano, which meant that he had to pitch to Willie Mays with two men on. Mays hit a double. There were two un-earned runs in the inning, and Simmons wound up losing the game 3-2 although he pitched pretty well.

My point is, it’s not ALL his fault, but it isn’t all the shortstop’s fault, either. When the shortstop makes an error leading off the inning with Mays and Cepeda due to bat, that’s not helpful, but when you walk Joey Amalfitano to pitch to Willie Mays, that’s not the shortstop’s fault.

His next start was June 5 against the Cubs. Simmons was leading 4-2 heading into the 7^th. He gave up singles to the first two hitters, both very weak hitters, and then the ball was tapped in front of the plate, probably a bunt. The catcher threw to third base and threw wildly, E-2. Simmons came out of the game, still leading 4-2 but with the bases loaded and nobody out. The bullpen allowed all three runs to score, all of them un-earned, although the Cardinals rallied to win the game anyway.

Again, you can’t really say that Simmons is responsible for those three runs, one of whom reached on an error and none of which scored while he was on the mound, but you can’t say that he wasn’t responsible at all, either. He was SOMEWHAT responsible.

Simmons’ next start, June 10 against Cincinnati. Gene Freese hits a double to center with a runner on first. That run is earned, but Curt Flood throws wildly to home plate, allowing Freese to take third on an E-8, and he will score on a sac fly, so that’s three straight starts with unearned run(s).

His next start, June 17 at Pittsburgh. In the first inning Ken Boyer botches a ground ball to put Dick Groat on first with Roberto Clemente coming up. Clemente hit .351 that year, with 23 homers, but Groat attempts to steal second and is out, so that one is no harm/no foul. Simmons gives up a couple of runs early but hits a leadoff double in the third inning, setting up a 2-run inning to give the Cardinals a 3-2 lead.

Same game, bottom of the third, one out, Bill Virdon singles. Dick Groat is up. Groat is very slow and hits a ton of ground balls, and he answers with a double play ball to the shortstop, Julio Gotay, but Gotay rushes the pickup and kicks the ball away, E-6. Runners on first and third, and that brings Clemente to the plate again. Clemente grounds out, but the run scores, un-earned run, tie game.

Still the same game, bottom of the sixth, still 3-3. With one out the pitcher, Bob Friend, hits a ground ball to Gotay. Gotay drops it again, another E-6. Virdon singles and Dick Groat hits a fly ball; un-earned run, 4-3 game.

Still the same game, eighth inning, game still 4-3. Simmons gives up leadoff singles to Groat and Clemente, and leaves the game. He is out of the game but still responsible for the runners on base. The reliever strikes out Dick Stuart, and Don Hoak hits a ground ball to Gotay, should be an inning-ending double play, but Gotay boots it again, his third error of the game, all three of them leading to un-earned runs.

Simmons has one start without an unearned run, then on June 27 faces the Braves. Frank Bolling hits a ground ball to short to start the sixth inning, but the shortstop (Alex Grammas) boots it, giving Simmons the opportunity to face Eddie Mathews with a man on base. Mathews homers, so there’s two runs, one of them un-earned.

Simmons next start is the first game of a Fourth of July double header against the Phillies. First batter of the game, Tony Taylor, hits a single, but the second hitter hits a double play grounder to the shortstop, Jerry Buchek. Buchek boots it, setting up a three-run inning, all three runs un-earned because of Buchek’s boot.

Simmons, however, draws a leadoff walk in the fifth inning, scores on Bill White’s homer, the Cardinals lead 4-3. In the sixth inning Simmons bats with two out and a man in scoring position, hits a run-scoring single to make it 5-3. By the top of the eighth the Cardinals have a big lead.

Eighth inning, two on, two out. Ground ball to short. Alex Grammas is playing short by now, as a defensive replacement for Buchek, but Grammas commits another error, loading the bases with two out. Simmons leaves the game, but all three runs eventually score, all three un-earned. Simmons is credited with the win despite allowing six un-earned runs in the game. Simmons has given up 16 un-earned runs in seven starts.

On the season, Simmons was charged with 23 un-earned runs, an exceptionally high number for 196 innings. Kralick was charged with 4 un-earned runs, Simmons 23—despite which Baseball Reference, making their absolute best and serious efforts to assign credit and responsibility, says that Kralick’s defensive support was poor—negative .09 runs per nine innings—while Simmons’ defensive support was exceptionally good, positive .30 runs per nine innings, or positive six and a half runs over the course of the season.

Of course, what Baseball Reference means by "defensive support" is not just errors—is not, and should not be. There is a great deal more to defense than just errors. The Cardinals, despite playing in what appears to be the best hitter’s park in baseball, despite a below-average number of strikeouts by pitchers and a far worse-than-average number of walks, led the National League in ERA. You have to explain it somehow. Baseball Reference credits the Cardinal defense with saving about 45 runs. But is that true?

Well. . ..it’s a bit of a reach. The Cardinals committed 166 errors, second-most in the league. They have three regulars who are outstanding defensive players—first baseman Bill White, third baseman Ken Boyer, and Center Fielder Curt Flood. But they have no regular catcher; their catchers are not as defensively troublesome as their shortstops, but they’re definitely not good. They have a 40-year-old in left field, Stan Musial, and a first baseman in right field, Joe Cunningham. Their second baseman, Julian Javier, is extremely quick but a rookie, and error-prone. It is VERY difficult to see how that team gets to be 45 runs better than average in the field.

Their top four starters, on the other hand, are Bob Gibson (251 major league wins), Larry Jackson (194 wins), Curt Simmons (193 wins) and Ray Sadecki (135 wins in an 18-year career.) Those guys were all in the majors a long, long time for a reason. The 130 Park Factor isn’t actually right, and the 45 runs for the defense isn’t actually right. It is NOT a convincing analysis.

But the real point I am trying to get to is this. Baseball Reference WAR assumes that the pitcher is AS MUCH responsible for an un-earned run as he is for an earned run, and thus makes no distinction between the two. Using that assumption, they reach the conclusion that Kralick is the most valuable pitcher in the American League.

ERA+ assumes that the pitcher is not at all responsible for un-earned runs, 100% off the hook. Using that assumption, ERA+ concludes that Curt Simmons was the most effective pitcher in the National League.

If you evaluate these pitchers based on ERA, Simmons is half a run ahead, 3.12 to 3.61. If you evaluate them by TOTAL runs allowed per nine innings, Kralick is far ahead, 3.76 to 4.18.

But if you assume that the pitcher should be held 50% responsible for the un-earned run, not 100%, then Simmons and Kralick are almost exactly the same—3.65 (Simmons) and 3.68 (Kralick). So what is the best answer?

It’s obvious, isn’t it? The best answer is that the pitcher does not bear the SAME responsibility for an unearned run as for an earned run, but he bears SOME responsibility for the un-earned run, which we will call 50% until we have some better answer.

And that’s what the Game Score system does. It charges the pitcher twice as much for an earned run as for an un-earned run. That was point in writing this. I am arguing for my system.

1961 NL—Sandy Koufax (D-WAR) and Don Cardwell (R-WAR)

Koufax was 18-13 with 269 strikeouts—a modern National League record at the time—and a 3.52 ERA. Don Cardwell was 15-14 with 156 strikeouts and a 3.82 ERA. The Park Effects are similar—104 Park Factor for Koufax, 105 for Cardwell. Cardwell gave up 22 un-earned runs; Koufax, 17.

Cardwell is selected by Baseball Reference as the best pitcher in the league in substantial part because they believe that the defense surrounding him was awful. They chart them at -0.46 runs per nine innings, or about 13 runs below average over the season. On the one hand I don’t believe that that’s an unreasonable number, as the Cubs’ defense in 1961 was very bad, but on the other hand I don’t think there is any way in hell that Don Cardwell was the best pitcher in the league. I have Koufax was a deserved record of 17-10, 7.7 WAR, which is less than Whitey Ford but more than any other pitcher in baseball, and I have Cardwell with a Deserved Won-Lost record of 15-13 and 5.5 WAR, which was 8^th in the league behind Koufax, Jim O’Toole, Warren Spahn, Mike McCormick, Joey Jay, Bob Gibson and Larry Jackson.

COMMENTS (4 Comments, most recent shown first)

raincheck
Loved this. Thanks.

WAR is a very useful thing, but too many use it as the whole debate. Your analysis of 1961, breaking down how WAR works, and the assumptions that are built into it, is particularly elegant. It shows how we have to understand how a number is arrived at before waving it around. Like any stat, it is widely misused by people who don’t know how it really works, and what the limitations are.

WAR is a great starting point for a discussion, but it should not be the whole discussion, particularly when it says things about Jack Kralick that even his mother never would have said.

Make Amalfatano Great Again!
10:03 AM Feb 24th

LesLein
In your book on managers you said that Stengel adjusted the Yankees’ rotation to pitch Ford against the toughest teams. Did this affect your assessment of Ford’s career?
7:23 PM Feb 22nd

villageelliott
Who is the president?
1:54 PM Feb 22nd

shthar
As Vice-President of the Joey Amalfitano fan club, I'd like to say, how dare you!
4:55 AM Feb 22nd

The Early Cy Young Seasons

COMMENTS (4 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: