Porlando and Vercello
In regard to the American League’s 2016 Cy Young vote, a notion has arisen that a vote for Rick Porcello was an old-line, traditionalist vote based on the won-lost record, whereas a vote for Justin Verlander was a modern, forward-looking vote based on sophisticated analysis. This is not an accurate perception, and, in the third part of this article, I will attempt to show why it is not. But first, I need to clear the decks in regard to a couple of related issues.
Part 1, trying to test the proposition that a pitcher as far behind in the won-lost record as Verlander is (from Porcello) would have had less than zero chance to be noted in the Cy Young Voting prior to about 2005 or 2009, but that this is no longer true.
My old, first, crude Cy Young formula from the 1970s was 2W – L (Two times wins, minus losses.) Porcello has a 2W – L score of 40 (44 - 4); Verlander, a score of 23 (32 – 9). Historically, a pitcher with a score of 40 or 41 has a 58% chance of winning the Cy Young Award, whereas a pitcher with a score of 22 or 23 has a 1% chance to win a Cy Young Award. This is the full chart:
2W - L g
|
Count
|
Winners
|
Vote Pct
|
Win Pct
|
42 or more
|
19
|
16
|
77%
|
84%
|
40
|
12
|
7
|
63%
|
58%
|
38
|
20
|
10
|
63%
|
50%
|
36
|
27
|
8
|
51%
|
30%
|
34
|
59
|
20
|
48%
|
34%
|
32
|
60
|
7
|
28%
|
12%
|
30
|
111
|
13
|
21%
|
12%
|
28
|
112
|
2
|
11%
|
2%
|
26
|
150
|
6
|
8%
|
4%
|
24
|
179
|
3
|
4%
|
2%
|
22
|
210
|
2
|
2%
|
1%
|
20
|
284
|
0
|
1%
|
0%
|
18
|
283
|
1
|
1%
|
0%
|
16
|
291
|
0
|
0%
|
0%
|
15 or less
|
647
|
1
|
0%
|
0%
|
First, I have to explain my study group. I studied all pitchers:
From 1956 to 2015
Who made 20 or more starts
  And who had a 2W – L score which was no more than 22 points worse than the best such record of any pitcher eligible for that particular Cy Young Award.
  Except that I eliminated the data from the leagues in which the Cy Young Award went to a reliever.
I didn’t include 2016 because I don’t have the 2016 data blended into my files yet, and I eliminated the years when a reliever won the Cy Young Award because I figured that those weren’t helpful to understanding Porcello vs. Verlander. When I publish this I will hear from two or three people who skip the details and want to tell me that I have 12 pitchers who score at 40 or 41 when actually there are 43 such pitchers in history, so what’s wrong with my study. Anyhoo. . .explaining. There are 12 pitchers in the study who have 2 W – L scores of 40 or 41, of whom 7 won the Cy Young Award, or 58%. The pitchers had an average vote share of 63%, meaning that they got 63% of the maximum possible total of Cy Young votes. On the other hand, there are 210 pitchers in the study who had 2 W – L scores of 22 or 23, of whom two won Cy Young Awards, or 1%, and they had an average vote percentage of 2%. So we can see, historically, that a pitcher who has a won-lost record of 22-4 will usually win the Cy Young Award, whereas a pitcher who has a record of 16-9 has a 99% chance of NOT winning the Cy Young Award—not that this means anything about who SHOULD win the Cy Young Award, but who might.
But this, of course, demonstrates in hard numbers something that we all knew to be true, anyway, so it’s not that helpful. I looked at how this has changed over time. Since 2006 a pitcher who has a 2 W – L score of 22 or 23 has a 5% chance to win the Cy Young Award, and a pitcher with a score of 40 has a 100% chance to win, but again. . .not all that helpful.
Next I looked at the margin between competing pitchers in this area. Since Porcello has a 2 W – L score of 40 and Verlander of 23 the margin between them is 17; Verlander can be coded as -17, and Porcello is coded at "0", meaning that he has the best won-lost record of any pitcher competing for the award. Each full game by this system is three points, so a point can be seen as a third of a game. In other words, 21-7 is three points better than 20-8 (35 to 32), 20-8 is three points better than 19-9 (32 to 29), 19-9 is three points better than 18-10 (29 to 26), etc.
What is most interesting in this chart is the impact of the first game, the first three points. There are 111 pitchers in the study who had the best 2W – L score in their league (or in the two leagues in the years 1956-1966). . .obviously counting ties as both having the best. That is entered in the chart below as "0", meaning that that pitcher is 0 points away from having the best won-lost record in the league. Those 111 pitchers had an average Vote Percentage of 66%, and 54% of them actually won the award. But when we look at pitchers are just ONE GAME OR LESS below that level, their chances of winning the Cy Young Award are more than cut in half, dropping to 20%. The sum of the "1", "2" and "3" pitchers below is 17 for 85, exactly 20%:
Margin Gp
|
Count
|
Win Tot
|
Vote %
|
Win Pct
|
0
|
111
|
60
|
66%
|
54%
|
1
|
22
|
6
|
40%
|
27%
|
2
|
25
|
6
|
30%
|
24%
|
3
|
38
|
5
|
27%
|
13%
|
4
|
40
|
6
|
28%
|
15%
|
5
|
40
|
1
|
16%
|
3%
|
6
|
31
|
4
|
20%
|
13%
|
7
|
53
|
2
|
11%
|
4%
|
8
|
63
|
2
|
8%
|
3%
|
9
|
79
|
2
|
7%
|
3%
|
10
|
97
|
0
|
3%
|
0%
|
11
|
99
|
0
|
4%
|
0%
|
12
|
99
|
0
|
3%
|
0%
|
13
|
122
|
0
|
2%
|
0%
|
14
|
134
|
0
|
1%
|
0%
|
15
|
156
|
1
|
2%
|
1%
|
16
|
152
|
0
|
1%
|
0%
|
17
|
163
|
0
|
1%
|
0%
|
18
|
176
|
0
|
0%
|
0%
|
19
|
190
|
0
|
0%
|
0%
|
20
|
184
|
0
|
0%
|
0%
|
21
|
192
|
1
|
1%
|
1%
|
22
|
198
|
0
|
0%
|
0%
|
If a pitcher’s won-lost record is more than one up to two games worse than the best in the league, his chance of winning the Cy Young Award drops to 10% (11 of 111); if he is two to three games below that level, it drops to 3%. And if he is more than 3 games below the level of the best in the league, his chance of winning the award basically drops to zero—historically. In recent years it is slightly above zero, but it is still very, very low. In the time period 1956 to 1980, the pitcher who had the best won-lost record in the group had a 62% chance to win the award. From 1981 to 2005, this drops to 55%, and from 2006 to 2015, it drops to 40%. This is the data from 2006 to 2015:
Margin Gp
|
Count
|
Win Tot
|
Vote %
|
Win Pct
|
0
|
25
|
10
|
63%
|
40%
|
1
|
5
|
2
|
45%
|
40%
|
2
|
6
|
1
|
21%
|
17%
|
3
|
13
|
0
|
20%
|
0%
|
4
|
7
|
1
|
18%
|
14%
|
5
|
9
|
0
|
8%
|
0%
|
6
|
9
|
2
|
36%
|
22%
|
7
|
14
|
1
|
12%
|
7%
|
8
|
19
|
0
|
7%
|
0%
|
9
|
25
|
1
|
8%
|
4%
|
10
|
21
|
0
|
1%
|
0%
|
11
|
27
|
0
|
4%
|
0%
|
12
|
41
|
0
|
3%
|
0%
|
13
|
31
|
0
|
4%
|
0%
|
14
|
26
|
0
|
0%
|
0%
|
15
|
36
|
0
|
3%
|
0%
|
16
|
37
|
0
|
1%
|
0%
|
17
|
33
|
0
|
2%
|
0%
|
18
|
43
|
0
|
1%
|
0%
|
19
|
48
|
0
|
1%
|
0%
|
20
|
44
|
0
|
0%
|
0%
|
21
|
40
|
1
|
3%
|
3%
|
22
|
58
|
0
|
1%
|
0%
|
We can still see, even in the most recent data, that having THE BEST won-lost record in the league, rather than the second-best, is tremendously important in Cy Young voting. If your won-lost record is two to three games behind the best in the league, your chance of winning the award is still only 3 to 4%.
OK, those were the things I did that didn’t actually work all that well. The thing that I did that DID work pretty well was this. I took the "Season Score" formula that I use, a formula which predicts the Cy Young vote with a good deal of accuracy, and created a version of it which takes out all references to wins, losses, and saves. This creates a "NOWOL Effectiveness Score", NOWOL standing for No Won Lost.
Then I ranked all of the Cy Young candidates in each group based on (1) the won-lost record, and (2) their NOWOL score. When you take the Wins and Losses out what is left in the formula is inning pitched, runs and earned runs allowed, strikeouts, walks and hit batsmen. I was trying to measure "How much have things changed in this area?" How much different is the voting NOW as opposed to other years? For illustration, in 1956 (the first Cy Young vote) Don Newcombe has the best won-lost record of any candidate (27-7), but the 7th-best NOWOL score, behind Warren Spahn, Early Wynn, Herb Score, Johnny Antonelli, Lew Burdette and Whitey Ford. In 1957 Warren Spahn, the winner, has the second-best won-lost record (behind Jim Bunning) and also the second-best NOWOL score (also behind Jim Bunning), although Spahn won. In 1958 Bob Turley won the Cy Young Award; he had the best won-lost record in the majors but the 8th best NOWOL score, with his teammate Whitey Ford having the best and Warren Spahn the second-best.
OK, so that’s the method. In the first group in my study, 1956 to 1980, the Cy Young Winner had an average rank of 1.56 in the won-lost record, but an average rank of 2.91 in the NOWOL score. In the second group, 1981 to 2005, the Cy Young winner had an average rank of 1.86 in the Won-Lost record, but 2.55 in the NOWOL score. And in the third group in my study, the years 2006 to 2015, the Cy Young winner had average rank of 3.30 in the Won-Lost record, but 1.50 in the NOWOL score.
That’s a LITTLE BIT misleading, because we are only talking about 20 contests, so one anomalous result has a large impact. There is an anomalous result, which is Felix Hernandez winning the Cy Young Award with the 27th best won-lost record in the league in 2010. But even if you throw out that one anomalous result, the scores are still 2.05 and 1.50. The NON-won-lost record elements of the pitchers record are more important in determining the Cy Young Award, in the last ten years, than the won-lost record.
Looking back by this method, we can actually see that the voting was starting to change before 2006. . .the numbers were 1.56 (won-lost) and 2.91 (NOWOL) from 1956 to 1980, but 1.86 (Won-Lost) and 2.55 (NOWOL) from 1981 to 2006, so the thinking of voters was changing BEFORE 2006. But when did it really change? If you divide the 1981 to 2006 period into two groups, 1981 to 1991 and 1992 to 2005, you see that the average ranks were 1.81 (Won-Lost) and 3.14 (NOWOL) in the first half of that time period, but 1.91 (Won-Lost) and 1.95 (NOWOL) in the second half of that group. So from 1992 forward, other factors were basically AS IMPORTANT as the won-lost record in determining the Cy Young vote.
Here’s what really happened, I think. You will remember that in the first half of the 1980s there are a series of "Bad Years" and bad votes for the Cy Young Award. This is basically because there aren’t any great pitchers in that era; it is the Era of No Great Pitchers. Steve Stone won 1980 with pretty mediocre NOWOL stats (5th best in the league). In 1982 Pete Vuckovich won the AL Award with the best won-lost record in the league but the tenth-best NOWOL score. In 1983 LaMarr Hoyt won the Award with the best won-lost record in the AL but the 6th-best NOWOL score. The NL was similar although not as bad.
Basically, what was happening was that the voters were looking for dominant pitchers like Seaver, Koufax, Gibson, Carlton, Palmer and Guidry, but, not finding any dominant pitchers, didn’t really know what to do. But here is what I didn’t understand before doing this study.
FOLLOWING that "Era of No Great Pitchers" there are a series of non-instructive awards, non-instructive meaning that they don’t tell us anything about the relative importance of the won-lost record vs. the other elements of a pitcher’s record. In 1984 a reliever won in the American League (Willie Hernandez) and Rick Sutcliffe won in the National League, which is a weird award because he came over in mid-season and was 16-1 and led the Cubs to the division championship. In the National League in 1985 Dwight Gooden had both the best won-lost record AND the best NOWOL score, so that’s a non-instructive example. In 1986 Roger Clemens had both the best won-lost record and the best NOWOL score, so that’s non-instructive. In 1987 and 1989 relievers won in the National League, so those are non-instructive as to how viewers weight the won-lost record. In 1988 Orel Hershiser has both the best won-lost record in the National League (tied with Danny Jackson) AND the best NOWOL score. In 1989 in the American League Bret Saberhagen has both the #1 won-lost record and the #1 NOWOL score. In 1985 Saberhagen is second in both areas, second behind Ron Guidry in the Won-Lost record and second behind Bert Blyleven in the NOWOL score, so again, that is a non-instructive example.
In 1990 the American League Award went to Bob Welch, who had the 5th-best NOWOL score in the league, but then, he did win 27 games, so you kind of have to cut the voters a little slack there. Doug Drabek in the National League was #1 in both areas. But the thing is that after this series of non-instructive examples. . .well, after the Era of No Great Pitchers followed by era of non-instructive examples, things had clearly changed. In 1991 Roger Clemens won the American League Award; he had the fifth-best won-lost record in the league but the best stats in the league other than the won-lost record. Tom Glavine won in the National League; he did not have the best won-lost record in the league. Then Greg Maddux won in the National League in 1992, 1993 and 1994, although he did not have the best won-lost record in the league in any of those three seasons, did have the best NOWOL score. In 1996 Pat Hentgen won, although Andy Pettitte had a better won-lost record. In 1997 Roger Clemens won in the American, although Randy Johnson had a better won-lost record, and Pedro Martinez won in the National, although four other pitchers had better won-lost records. The Won-Lost record was no longer the king of the library. From 1992 to 2005 other statistics were basically AS important in the Cy Young voting as the won-lost record, and since 2006 the other stats have been MORE important than the won-lost record.
OK, let’s drill down now on Porcello vs. Verlander, 2016. Other than the fact that Porcello’s won-lost record is better than Verlander’s by a margin which would have been 100% decisive 100% of the time up to 2005, their summary stats appear to be almost dead even. Porcello pitched 223 innings; Verlander pitched 227.2. Porcello faced 890 batters; Verlander faced 903. Porcello allowed 78 earned runs; Verlander allowed 77. Porcello allowed 238 runners to reach base; Verlander allowed 236. Porcello allowed opponents a .635 OPS, with .268 on base percentage and .367 slugging; Verlander allowed opponents a .630 OPS, with a .263 on base percentage and .368 slugging.
Those are all summary stats, looking at what the individual categories add up to. In individual categories they are not as close. Verlander has way more strikeouts than Porcello, but almost twice as many walks. Verlander gave up 30 home runs; Porcello gave up only 23.
The effects of these things are folded into the summary stats. Verlander has a small advantage in the summary stats, but it is too small to be very meaningful. But when we look at the park effects, things would appear to swing strongly in Porcello’s direction. Porcello pitched in Fenway Park, which had a park factor of 120. Verlander pitched in Comerica, which had a park factor of 102. If you adjust their ERAs for the parks in which they pitched, Verlander goes from 3.04 to 3.01, but Porcello goes from 3.15 to 2.89. Porcello appears to be ahead.
Why, then, is there a perception that Verlander has better analytical numbers than Porcello does? I should say, at some point, that had Verlander won the award, that would have been reasonable; Red Sox fans could not have complained from a position of strength. It would have been like the MVP Award; Mookie was deserving, too, but you can’t really complain about Trout winning it. But back to the question: why is there a perception that Verlander has better analytical numbers than Porcello does?
What I learned from Joe Posnanski, in discussing this before either of us published anything about it, is the clear answer to that question.
So, you will ask, why was there a perception that Verlander had the better advanced metrics season?
Answer: Baseball Reference.
Baseball Reference WAR
Verlander 6.6
Porcello 5.0
Now, let me pause here to say: Baseball Reference is a miracle. It is the joy of my life and the joy of most baseball writer’s lives. If forced to give up Baseball Reference or a family member, well, it would depend on which family member. But I am convinced that the main reason Justin Verlander got 14 first place Cy Young votes to Porcello’s 8 is because of that fairly sizable gap in Baseball Reference WAR. There might be other factors, but I would wager that this is by far the biggest one.
I say that because Baseball Reference WAR is absolutely the biggest reason I thought that Verlander had the better statistical season.
Hey, I check Baseball Reference WAR every single day of the season. Well, I’m on the site every single day — I imagine many baseball writers are on the site every single day — and WAR is on a front page box, updated constantly. That Verlander lead in Baseball WAR absolutely played in my mind all season long. Everything else about the two pitchers was so close so for me it came down to Porcello’s won-loss record or Verlander’s 1.6 win edge on Baseball Reference.
Until Joe explained that to me, I didn’t understand what was going on here. Sure, I use Baseball Reference every day, as most edjactated baseball writers do, but I don’t pay ANY attention to their WAR, or to Fangraphs WAR, or to any WAR unless the government is threatening to draft one of my sons to fight in it. Honestly, I didn’t realize that Baseball Reference HAD a front page. I just bypass that and look up whatever I am trying to look up. Also Tom Tango, who was participating in the discussion with us, helped us both to understand how the pieces were fitting together.
I don’t pay any attention to their WAR for pitchers in part because I don’t believe one should substitute evaluative numbers for educated judgment, but also in part because I know from past studies that their WAR estimate is just not that good; my apologies to Sean, but it isn’t. But I am getting ahead of myself; the next question in the logical sequence is, if Porcello has a better park-adjusted ERA than Verlander in basically the same number of innings, why in the world does Verlander have a WAR which is 32% higher? Isn’t that kind of a big discrepancy?
A little bit of it comes from small details. I gave you their ERAs before and park-adjusted ERAs, but I believe that Baseball Reference uses ALL runs allowed, rather than EARNED runs allowed. Porcello allowed three more un-earned runs than Verlander did, so there’s that. Another little thing is that I adjusted their ERAs based on the 2016 park factors, but I believe that Baseball Reference uses a multi-year park adjustment. The multi-year park factor for Fenway is smaller than the 2016 number. Neither is right or wrong; there are problems created by either option, and it’s just not clear which gives you a truer read on the effect of the park.
But the big thing is this. Baseball Reference, I am told, adjusts the pitcher’s performance for the park in which he pitched, and also for the quality of the defense behind him. I am going to say that again, because it turns out to be a really big thing: and also for the quality of the defense behind him.
According to Baseball Info Solutions, the Fielding Bible and John Dewan, the Boston Red Sox’ defense in 2016 was 54 runs better than average. The Tigers’ defense was 50 runs WORSE than average. It’s 104 runs in 162 games. It’s, well. . .a lot of runs.
The logic of the Baseball Reference WAR analysis is that, given the same defense behind them, the same park, Justin Verlander WOULD HAVE allowed significantly fewer runs than Rick Porcello. The question this pushes us to is, Is this actually a reasonable thing to believe?
No, it isn’t. Maybe it is a reasonable adjustment in theory, I don’t know. Maybe if we compared 100 different pitchers, this would be a useful and instructive adjustment in the other 98 cases; I don’t know. But we’re talking about this case.
Verlander faced 903 batters, of whom 349 either homered, walked, struck out or were hit by a pitch. 554 put the ball in play.
Porcello faced 890 batters, of whom 257 homered, walked, struck out or were hit by a pitch, and 633 put the ball in play.
This defensive impact adjustment, then, must occur on these balls in play, right? But of the 554 balls in play against Verlander, 141 were hits, a .255 average. Of the 633 balls put in play against Porcello, 170 were hits, a .269 average.
Nine players reached on error while Porcello was on the mound (actually eight players, but Max Kepler reached on errors twice while Porcello was pitching. Love Max Kepler.) Only four batters reached on error while Verlander was pitching.
There were 35 doubles against Verlander, 36 against Porcello. There were 4 triples against Verlander, 5 against Porcello.
There were 5 bases stolen against Verlander, with 6 runners caught stealing, whereas there were 7 bases stolen against Porcello, with 3 runners caught stealing.
Also, according to Posnanski:
The biggest difference in the two defenses was in right and centerfield. The Red Sox centerfielder and rightfielder saved 44 runs, because Jackie Bradley and Mookie Betts are awesome. The Tigers centerfield and rightfielder cost 49 runs because Cameron Maybin, J.D. Martinez and a cast of thousands are not awesome.
But the Tigers outfield certainly didn’t cost Verlander. He allowed 216 fly balls in play, and only 16 were hits. Heck, the .568 average he allowed on line drives was the lowest in the American League. I find it almost impossible to believe that the Boston outfield would have done better than that.
Joe says it’s 13 runs, whereas I figure it must be 16, but 13 runs, 16 runs. . .it’s a bunch. Sixteen runs is the difference between Rick Porcello and Jeremy Hellickson or Miguel Gonzalez or Liam Hendriks or Erasmo Ramirez or Jeff Samardzija or Sean Manaea. My point it, you can’t just infer something like that without evidence. The proposition that Verlander WOULD HAVE allowed significantly fewer runs than Porcello given an equal defense is just not reasonable, given the facts on the ground.
We’ll get into the issue of whether we should COMPLETELY ignore wins and losses another time. Justin Verlander is a great pitcher, probably a Hall of Fame pitcher; Rick Porcello will need to have two or three more dominant seasons before he enters that conversation. I haven’t seen Rick Porcello’s wife or lady friend and don’t know if he has one, but if she’s hotter than Kate Upton, that would be surprising.
But there is no reasonable sabermetric argument that Justin Verlander had a significantly better season in 2016 than Rick Porcello, or even, really, that he had a better season at all. The idea that he did was created by flawed analysis.