2017-30
The Proper Weight for the Last Start
OK, first of all, I want you yahoos to know that you kept me up all night, and I need the sleep, frankly. I do have other things I should be working on.
Anyway, I saw how we could "test" whether the proper weighting for the most recent start was 1%, 1.5%, 2%. . . .whatever. . .so I had to do it. Lot of work.
  Here’s what I did. I made a simplified version of the ranking system, meaning that it was basically the same thing but missing some of the bells and whistles. In the "real" system, I adjust the Game Score for every game by considering the park in which the game is played and also the quality of the opposing offense. In this system I didn’t do that. It makes a very minor difference. . .really no practical difference at all in most cases.
In the "real" system I create rankings for every day of the calendar year by modifying the pitcher’s last score by the length of time since he has pitched. That’s a LOT of work, a lot of work, and I didn’t do that. I only modified the scores when the pitcher actually pitched, and I only made leader boards for each five-day period of time. If a pitcher started twice in a five-day period he would be listed twice, and if he didn’t start in those five days he wouldn’t be listed at all. That makes more difference than the other thing does, but I don’t see why it should queer the results of the study, which is all that I am interested in.
In the "real" system I include post-season starts. In this version I didn’t include them, because they’re not in the data. That makes some difference in who would rate where, but again, I don’t see ANY reason why it would queer the study, although if Guy123 was still with us I am sure that he would come up with several reasons. In the "real" system we move pitchers backward from season to season based on how many days that are inactive; in this version I just moved pitchers backward 100 points at the start of every new season.
Anyway, it’s a simplified, stripped-down version of the ranking system, and here’s what I did with it. Given a simplified, stripped-down version of the ranking system, I varied the "weighting percentage" for the most recent start from 1 percent to 6.5%, doing ratings (and thus rankings) for all half-steps in between.. .1%,1.5%,2%,2.5%,3%,3.5%,4%,4.5%,5%,5.5%,6%, 6.5%. . …
Having done that, I figured each pitcher’s record during his next ten starts—his average game score, wins, losses, innings pitched, earned runs allowed, and ERA. The theory is. . .the theory will eventually be shown true. . .the theory is that (a) pitchers who rate higher should perform better over their next ten starts, and (b) if the system gets to be more effective, more predictive, then the difference between the high-ranking pitchers and the low-ranking pitchers should increase. Right?
  Joedimino suggested that we should "see how they do in their next start or 5 starts, whatever, but a short timeframe, so ability is basically unchanged." That’s not QUITE right, I don’t think. One thing that makes #1s #1s and #57s #57s is not just that MOMENT, but that there is STABILITY in their performance. I used 10 starts rather than 1 or 5, because if that highly-rated pitcher loses his level of effectiveness almost immediately, that’s relevant, rather than irrelevant. It MATTERS whether he can hold on to that level of effectiveness. But basically that’s what I did, only I used 10 starts rather than 1 or 5.
Also, there is a very, very, very important difference between EXACTLY what I am trying to do and exactly what Tango is doing, I think, which has potentially major impact on the issue of whether what I have found is relevant to HIS method.
Tango says he uses about 1% "decay rate" per start, whereas I am using 3%. But on the issue of which value creates more accurate ratings, there is this very crucial difference: that I start every pitcher out at the bottom of the scale and make them fight their way up the chart, whereas I would not assume that Tango has done this. I am assuming that what Tango is saying is simply that, in valuing the pitcher’s statistics, he de-emphasizes what was done a year ago by 30%, two years ago by 51%, three years ago by 65.7%, etc. That is very different, in that it assumes that what happened in the more distant past was a vacuum, having no impact on the rating. My method essentially assumes that there’s a "dead weight" back there at the start of the process, and the pitcher has to prove that he has shaken off that dead weight in order to move up the scale. Some of you aren’t going to understand what in the hell I am talking about, but Tom will understand it. Tom is essentially starting everyone off in the middle of the scale. . .that’s not EXACTLY true, but that’s as close as I can come to explaining it. I’m starting everyone off at the BOTTOM of the scale. It makes a big difference.
The first place it makes a difference is in the "backward movement" between seasons. In the real system I let a pitcher’s "value number" decay slowly when he is not pitching, such as between seasons. It causes a pitcher to go backward by about 100 points over the winter, more or less; Kershaw ends one season at 620 and starts the next one at 520. This makes sense, to me, because pitchers DO very routinely lose all effectiveness over the course of a winter, so each pitcher has to "prove himself" again every year.
But in this test, when you start everybody out at 300.000 and apply the one-percent per-start adjustment that Tom suggested, nobody gets to 400 points in the first year, so everybody goes back to 300 at the start of every year. . .not absolutely, but generally; it takes a really strong season to get from 300 to 400 in one season, if we’re only making 1% adjustments per game.
SO I had to fix the system for that problem. I did that by changing the "backward shift" between seasons to 33 points, rather than 100, when the weighting for each start was only 1%. When the weighting was changed to 1.5%, then the backward drift was increased to 50 points; when the weighting was increased to 2%, then the backward drift was increased to 66 points. When the weighting was changed to 2.5%, then the backward drift was 83 points, and when the weighting was 3% or more than 3%, then the backward drift was 100 points between seasons.
I don’t know if that makes sense to everybody. When you give more weight to each start, then the pitchers move more rapidly away from the 300.000 where everybody starts, so that in one season they are able to get to 450 or 500. Then, when you move everybody backward by 100 points, that doesn’t reset the system so that all pitchers start the season even; it just re-sets the point totals so that the good pitchers are still ahead, but the numbers are lower.
Anyway. . . what weighting you use sometimes makes a tremendous difference in where pitchers are rated. Wayne Twitchell in 1973 pitched a series of brilliant games. In a stretch of 11 starts he pitched four shutouts and several other excellent games, including a four-hitter in which the only run was un-earned. If you weight each of those starts at 6.5%, then Twitchell at the end of that run ranks as the #11 starting pitcher in baseball. If you weight each of them at 1%, he ranks 97th. Big difference. Or. . .perhaps a more relatable example. . remember that tremendous run that Kris Medlen had in 2012? If you weight each start at 6.5%, Medlen climbs to 26th in the ratings. If you weight each start at 1%, he is still 98th.
Or, on the other end, Pedro Martinez in 2008, when he made 20 starts with a 5.61 ERA. If you weight each start at 1%, Pedro still ranks as the #4 starting pitcher in baseball. If you weight each start at 6.5%, he ranks 105th. Significant difference there. Those are the extreme examples, but every great pitcher has a phase like that at the end of his career, when he can be ranked anywhere from 10th to 95th based on what weight you give to his recent outings. Roy Halladay, Steve Carlton, Randy Johnson. . .they all take a quick tumble like that late in their careers.
OK, so having explained that, here’s what happens. I rated all pitchers 1952 to 2013, and I rated them in each 5-day window in that period. Then I eliminated all the data prior to 1960, because there are gaps in the data early and also you need time after you start the rating system to allow pitchers to find their level, and then I eliminated all the data from five-day windows in which there were less than 105 pitchers who made a start.
We have lots and lots of data; we don’t need to mess with the "weak" data. We can work only with the "strong" data. Sometimes you’ll have a five-day window at the start of a season or at the end of a season or over the All-Star break in which there are only a few games played, so the pitcher who rates #1 in that time period might be not a worthy representative of a #1 pitcher. We don’t need that data.
We are left with 1,412 rating periods, and thus with 14,120 pitchers who rank 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th,9th or 10th during one of those rating periods. If we look at the next ten starts for those 14,120 pitchers, then, we would have 141,200 games—understanding that many of these are redundant counts, because the pitchers who rank highly in one five-day rating period almost always rank highly in the NEXT five-day rating period as well. We will call these the "follow up" games.
We don’t quite have 141,200 follow up games, because not everybody makes 10 more starts before their career ends or we hit the end of the study period or something. We have 139,888 follow up games (for pitchers rated 1 through 10).
The Average Game score, for those 139,888 games by the pitchers rated 1 through 10 (with a 1% weighting for each start) was 56.49. They had 63,020 wins, 44,428 losses for a .587 winning percentage, and they had a 3.36 ERA.
Next, let’s look at the pitchers who ranked 50 to 59th on the list. There are also 14,120 of these pitchers. These 14,120 pitchers had 136,885 follow up games. THEIR average game score was 50.32, their winning percentage was .502 (49,832-49,457) and their ERA was 4.12.
Finally, we look at the pitchers who ranked in spots 96 to 105 in each of the 1,412 study periods—another 14,120 pitchers. These pitchers had 125,599 follow-up games, with a .450 winning percentage (39,365-48,163), and a 4.51 ERA.
So we have what a businessman would call proof of concept. The data shakes out the way it SHOULD shake out. The highly-rated pitchers do in fact pitch much better, over their next ten starts, than the lower-rating pitchers. In the study, six pitchers had 10-start stretches in which they went 10-0: Roger Clemens, Bob Gibson, John Smoltz, Gaylord Perry, Justin Verlander and Brandon Webb. No one went 0-10; Hideo Nomo went 0-9 with an 8.60 ERA.
Now, having established that the process works, we can then compare whether it works better or worse if we increase the weight given to each start.
First, I compared the two ends of the chart—1% per start, and 6.5% per start.
It turns out, given this structure, that a 6.5% weighting works better than a 1% weighting. Again, cautionary note: this may not conflict directly with Tom Tango’s position, given his method. The heavier weighting works better, in my system, because it moves deserving pitchers up through the rankings more rapidly than a 1% weighting. But if you don’t start everybody out at the bottom of the scale, then you may not have that effect, so you might do better with a 1% weighting, I don’t know. (I’ll amend this comment later.)
But in THIS system. . .as I said, pitchers ranked 1-10 with a 1% weighting have an average Game Score for their next ten starts of 56.49, a winning percentage of .587, and an ERA of 3.36. But when we increase the weighting to 6.5%, then the average Game Score increases to 57.38, the winning percentage increases to .592, and the ERA drops to 3.28.
On the other end of the scale. . .not the EXTREME other end of the scale, because the extreme other end of the scale might be 132 pitchers or 162 or some other weird number. . .but the pitchers ranked 96 to 105, with the 1% weighting, have an average Game Score of 48.11, a winning percentage of .450, and an ERA of 4.51. But with the 6.5 weighting, the average Game Score drops to 47.28, the winning percentage drops to .443, and the ERA increases to 4.64. The 6.5% weighting does a better job of predicting future performance than does the 1% weighting.
1.5% is more effective at predicting future performance than 1%. 1.5% (as opposed to 1%) increases the average Game Score from 56.49 to 57.11, and the winning percentage from .587 to .592. The ERA drops from 3.36 to 3.30. Parallel changes on the other end of the scale. . .maybe I shouldn’t write out all of those.
2% is as effective as 6.5% at sorting out the top pitchers. With a 2% weight for each game, the pitchers ranked 1-10 have an average Game Score during their next ten starts of 57.35, a winning percentage of .593, and an ERA of 3.27—basically the same data as the 6.5% weighting.
2.5% is. . .well, the data is a little bit mixed, but it appears to be a tiny bit more effective than 2%. I’ll chart the data for you in a moment. The average Game Score in the next ten starts increases from 57.34 to 57.44, and the ERA drops from 3.27 to 3.26, although the Winning Percentage also drops.
3% is a tiny bit more effective than 2.5%.
3.5% is almost indistinguishable from 3%. . ..just a tiny, tiny, tiny bit worse.
4% is indistinguishable from 3.5%.
4.5% looks the same as 4%.
OK, It’s 7:00 in the morning, I’ve been up all night, I’m tired and I need to get to bed, so I’m not going to run the data for 5%, 5.5% or 6%. I am almost embarrassed to admit this, but it appears that my instinct—3%--may have been as good a place to put the correct weight as any other—not necessarily BETTER than other options, certainly not markedly better, but as good as any.
Here’s a chart of the data from the study:
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
1%
|
1 to 10
|
14120
|
139888
|
7902691
|
56.49
|
63020
|
44428
|
.587
|
1%
|
50 to 59
|
14120
|
136885
|
6888067
|
50.32
|
49832
|
49457
|
.502
|
1%
|
96 to 105
|
14120
|
125599
|
6042409
|
48.11
|
39365
|
48163
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
1.50%
|
1 to 10
|
14120
|
140300
|
8011761
|
57.10
|
64074
|
44175
|
.592
|
1.50%
|
50 to 59
|
14120
|
137777
|
6892017
|
50.02
|
49559
|
50349
|
.496
|
1.50%
|
96 to 105
|
14120
|
124878
|
5994503
|
48.00
|
38993
|
47805
|
.449
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
2%
|
1 to 10
|
14120
|
140428
|
8053345
|
57.35
|
64489
|
44288
|
.593
|
2%
|
50 to 59
|
14120
|
137841
|
6876234
|
49.89
|
49392
|
50862
|
.493
|
2%
|
96 to 105
|
14120
|
124188
|
5954408
|
47.95
|
38746
|
47432
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
2.50%
|
1 to 10
|
14120
|
140500
|
8070611
|
57.44
|
64507
|
44580
|
.591
|
2.50%
|
50 to 59
|
14120
|
138405
|
6912131
|
49.94
|
50010
|
50975
|
.495
|
2.50%
|
96 to 105
|
14120
|
123391
|
5901065
|
47.82
|
38558
|
47101
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
3%
|
1 to 10
|
14120
|
140525
|
8078235
|
57.49
|
64674
|
44537
|
.592
|
3%
|
50 to 59
|
14120
|
138626
|
6926620
|
49.97
|
50009
|
50916
|
.496
|
3%
|
96 to 105
|
14120
|
123398
|
5895252
|
47.77
|
38616
|
47226
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
3.50%
|
1 to 10
|
14120
|
140526
|
8077707
|
57.48
|
64696
|
44556
|
.592
|
3.50%
|
50 to 59
|
14120
|
138522
|
6904902
|
49.85
|
49558
|
51364
|
.491
|
3.50%
|
96 to 105
|
14120
|
123107
|
5877328
|
47.74
|
38406
|
47025
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
4%
|
1 to 10
|
14120
|
140491
|
8075853
|
57.48
|
64716
|
44580
|
.592
|
4%
|
50 to 59
|
14120
|
138447
|
6910182
|
49.91
|
49921
|
51130
|
.494
|
4%
|
96 to 105
|
14120
|
123082
|
5875938
|
47.74
|
38339
|
46935
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
4.50%
|
1 to 10
|
14120
|
140469
|
8076205
|
57.49
|
64691
|
44581
|
.592
|
4.50%
|
50 to 59
|
14120
|
138442
|
6913713
|
49.94
|
49886
|
50956
|
.495
|
4.50%
|
96 to 105
|
14120
|
123300
|
5890179
|
47.77
|
38445
|
46932
|
.450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
Average
|
|
|
|
|
|
|
Followup
|
Game
|
Game
|
|
|
Winning
|
|
|
Pitchers
|
Games
|
Score
|
Score
|
Wins
|
Losses
|
Percentage
|
6.50%
|
1 to 10
|
14120
|
140380
|
8055387
|
57.38
|
64556
|
44449
|
.592
|
6.50%
|
50 to 59
|
14120
|
138493
|
6939193
|
50.11
|
50193
|
50824
|
.497
|
6.50%
|
96 to 105
|
14120
|
119568
|
5653119
|
47.28
|
36743
|
46183
|
.443
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
1%
|
1 to 10
|
2884593
|
359034
|
3.36
|
1%
|
50 to 59
|
2547200
|
388793
|
4.12
|
1%
|
96 to 105
|
2176415
|
363927
|
4.51
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
1.50%
|
1 to 10
|
2920220
|
356199
|
3.29
|
1.50%
|
50 to 59
|
2552887
|
393337
|
4.16
|
1.50%
|
96 to 105
|
2157562
|
361694
|
4.53
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
2%
|
1 to 10
|
2933198
|
355277
|
3.27
|
2%
|
50 to 59
|
2548823
|
395337
|
4.19
|
2%
|
96 to 105
|
2141380
|
359162
|
4.53
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
2.50%
|
1 to 10
|
2939608
|
355180
|
3.26
|
2.50%
|
50 to 59
|
2565213
|
397254
|
4.18
|
2.50%
|
96 to 105
|
2123889
|
357815
|
4.55
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
3%
|
1 to 10
|
2942246
|
355158
|
3.26
|
3%
|
50 to 59
|
2568881
|
397638
|
4.18
|
3%
|
96 to 105
|
2123755
|
358822
|
4.56
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
3.50%
|
1 to 10
|
2942569
|
355362
|
3.26
|
3.50%
|
50 to 59
|
2562431
|
399032
|
4.20
|
3.50%
|
96 to 105
|
2116249
|
357931
|
4.57
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
4%
|
1 to 10
|
2943089
|
355572
|
3.26
|
4%
|
50 to 59
|
2564417
|
397879
|
4.19
|
4%
|
96 to 105
|
2116549
|
358093
|
4.57
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
4.50%
|
1 to 10
|
2943181
|
355529
|
3.26
|
4.50%
|
50 to 59
|
2564429
|
397245
|
4.18
|
4.50%
|
96 to 105
|
2122500
|
358410
|
4.56
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earned
|
|
|
|
Outs
|
Runs
|
ERA
|
6.50%
|
1 to 10
|
2935716
|
356274
|
3.28
|
6.50%
|
50 to 59
|
2569612
|
394443
|
4.14
|
6.50%
|
96 to 105
|
2048380
|
352094
|
4.64
|
The one conclusion that I can firmly reach after doing this is that 1% or 1.5% is clearly not the right answer—and I SUSPECT, although I can’t prove this, that it isn’t the right answer in Tom’s method, either. The reason that 1% is not the right weight is that it fails to pick up on pitcher’s loss in effectiveness near the end of their careers.
"Near the end of their careers" is a fraught term. It sounds like we are talking about Roger Clemens in 2004, Chuck Finley in 2002, Kevin Appier in 2002, Jack Morris in 1993, etc., and yes, we are talking about them. Almost every outstanding pitcher has a moment at the end of his career in which (a) he has not been pitching well, and (b) he is going to get really hammered during his next ten starts, but (c) he would still rank as one of the ten top pitchers in baseball at that moment, based on a 1% weighting scale, because the weight isn’t enough to pick up what is happening.
But it is not JUST that moment. There is a Sonny Gray/Jose Quintana moment that happens much earlier to many pitchers. Jose Quintana is not old and has not yet had a "great" career, but he is also not pitching well at the moment, either. A lot of guys hit the end of the road when they are 26, 27, 28, 29 years old. A 1% weighting system isn’t going to see that when it is happening.
Other than that. . .you can use 2%, 3%, anything up to 6%, and one predicts the future about as well as another.