201730
The Proper Weight for the Last Start
OK, first of all, I want you yahoos to know that you kept me up all night, and I need the sleep, frankly. I do have other things I should be working on.
Anyway, I saw how we could "test" whether the proper weighting for the most recent start was 1%, 1.5%, 2%. . . .whatever. . .so I had to do it. Lot of work.
  Here’s what I did. I made a simplified version of the ranking system, meaning that it was basically the same thing but missing some of the bells and whistles. In the "real" system, I adjust the Game Score for every game by considering the park in which the game is played and also the quality of the opposing offense. In this system I didn’t do that. It makes a very minor difference. . .really no practical difference at all in most cases.
In the "real" system I create rankings for every day of the calendar year by modifying the pitcher’s last score by the length of time since he has pitched. That’s a LOT of work, a lot of work, and I didn’t do that. I only modified the scores when the pitcher actually pitched, and I only made leader boards for each fiveday period of time. If a pitcher started twice in a fiveday period he would be listed twice, and if he didn’t start in those five days he wouldn’t be listed at all. That makes more difference than the other thing does, but I don’t see why it should queer the results of the study, which is all that I am interested in.
In the "real" system I include postseason starts. In this version I didn’t include them, because they’re not in the data. That makes some difference in who would rate where, but again, I don’t see ANY reason why it would queer the study, although if Guy123 was still with us I am sure that he would come up with several reasons. In the "real" system we move pitchers backward from season to season based on how many days that are inactive; in this version I just moved pitchers backward 100 points at the start of every new season.
Anyway, it’s a simplified, strippeddown version of the ranking system, and here’s what I did with it. Given a simplified, strippeddown version of the ranking system, I varied the "weighting percentage" for the most recent start from 1 percent to 6.5%, doing ratings (and thus rankings) for all halfsteps in between.. .1%,1.5%,2%,2.5%,3%,3.5%,4%,4.5%,5%,5.5%,6%, 6.5%. . …
Having done that, I figured each pitcher’s record during his next ten starts—his average game score, wins, losses, innings pitched, earned runs allowed, and ERA. The theory is. . .the theory will eventually be shown true. . .the theory is that (a) pitchers who rate higher should perform better over their next ten starts, and (b) if the system gets to be more effective, more predictive, then the difference between the highranking pitchers and the lowranking pitchers should increase. Right?
  Joedimino suggested that we should "see how they do in their next start or 5 starts, whatever, but a short timeframe, so ability is basically unchanged." That’s not QUITE right, I don’t think. One thing that makes #1s #1s and #57s #57s is not just that MOMENT, but that there is STABILITY in their performance. I used 10 starts rather than 1 or 5, because if that highlyrated pitcher loses his level of effectiveness almost immediately, that’s relevant, rather than irrelevant. It MATTERS whether he can hold on to that level of effectiveness. But basically that’s what I did, only I used 10 starts rather than 1 or 5.
Also, there is a very, very, very important difference between EXACTLY what I am trying to do and exactly what Tango is doing, I think, which has potentially major impact on the issue of whether what I have found is relevant to HIS method.
Tango says he uses about 1% "decay rate" per start, whereas I am using 3%. But on the issue of which value creates more accurate ratings, there is this very crucial difference: that I start every pitcher out at the bottom of the scale and make them fight their way up the chart, whereas I would not assume that Tango has done this. I am assuming that what Tango is saying is simply that, in valuing the pitcher’s statistics, he deemphasizes what was done a year ago by 30%, two years ago by 51%, three years ago by 65.7%, etc. That is very different, in that it assumes that what happened in the more distant past was a vacuum, having no impact on the rating. My method essentially assumes that there’s a "dead weight" back there at the start of the process, and the pitcher has to prove that he has shaken off that dead weight in order to move up the scale. Some of you aren’t going to understand what in the hell I am talking about, but Tom will understand it. Tom is essentially starting everyone off in the middle of the scale. . .that’s not EXACTLY true, but that’s as close as I can come to explaining it. I’m starting everyone off at the BOTTOM of the scale. It makes a big difference.
The first place it makes a difference is in the "backward movement" between seasons. In the real system I let a pitcher’s "value number" decay slowly when he is not pitching, such as between seasons. It causes a pitcher to go backward by about 100 points over the winter, more or less; Kershaw ends one season at 620 and starts the next one at 520. This makes sense, to me, because pitchers DO very routinely lose all effectiveness over the course of a winter, so each pitcher has to "prove himself" again every year.
But in this test, when you start everybody out at 300.000 and apply the onepercent perstart adjustment that Tom suggested, nobody gets to 400 points in the first year, so everybody goes back to 300 at the start of every year. . .not absolutely, but generally; it takes a really strong season to get from 300 to 400 in one season, if we’re only making 1% adjustments per game.
SO I had to fix the system for that problem. I did that by changing the "backward shift" between seasons to 33 points, rather than 100, when the weighting for each start was only 1%. When the weighting was changed to 1.5%, then the backward drift was increased to 50 points; when the weighting was increased to 2%, then the backward drift was increased to 66 points. When the weighting was changed to 2.5%, then the backward drift was 83 points, and when the weighting was 3% or more than 3%, then the backward drift was 100 points between seasons.
I don’t know if that makes sense to everybody. When you give more weight to each start, then the pitchers move more rapidly away from the 300.000 where everybody starts, so that in one season they are able to get to 450 or 500. Then, when you move everybody backward by 100 points, that doesn’t reset the system so that all pitchers start the season even; it just resets the point totals so that the good pitchers are still ahead, but the numbers are lower.
Anyway. . . what weighting you use sometimes makes a tremendous difference in where pitchers are rated. Wayne Twitchell in 1973 pitched a series of brilliant games. In a stretch of 11 starts he pitched four shutouts and several other excellent games, including a fourhitter in which the only run was unearned. If you weight each of those starts at 6.5%, then Twitchell at the end of that run ranks as the #11 starting pitcher in baseball. If you weight each of them at 1%, he ranks 97^{th}. Big difference. Or. . .perhaps a more relatable example. . remember that tremendous run that Kris Medlen had in 2012? If you weight each start at 6.5%, Medlen climbs to 26^{th} in the ratings. If you weight each start at 1%, he is still 98^{th}.
Or, on the other end, Pedro Martinez in 2008, when he made 20 starts with a 5.61 ERA. If you weight each start at 1%, Pedro still ranks as the #4 starting pitcher in baseball. If you weight each start at 6.5%, he ranks 105^{th}. Significant difference there. Those are the extreme examples, but every great pitcher has a phase like that at the end of his career, when he can be ranked anywhere from 10^{th} to 95^{th} based on what weight you give to his recent outings. Roy Halladay, Steve Carlton, Randy Johnson. . .they all take a quick tumble like that late in their careers.
OK, so having explained that, here’s what happens. I rated all pitchers 1952 to 2013, and I rated them in each 5day window in that period. Then I eliminated all the data prior to 1960, because there are gaps in the data early and also you need time after you start the rating system to allow pitchers to find their level, and then I eliminated all the data from fiveday windows in which there were less than 105 pitchers who made a start.
We have lots and lots of data; we don’t need to mess with the "weak" data. We can work only with the "strong" data. Sometimes you’ll have a fiveday window at the start of a season or at the end of a season or over the AllStar break in which there are only a few games played, so the pitcher who rates #1 in that time period might be not a worthy representative of a #1 pitcher. We don’t need that data.
We are left with 1,412 rating periods, and thus with 14,120 pitchers who rank 1^{st}, 2^{nd}, 3^{rd}, 4^{th}, 5^{th}, 6^{th}, 7^{th}, 8^{th},9^{th} or 10^{th} during one of those rating periods. If we look at the next ten starts for those 14,120 pitchers, then, we would have 141,200 games—understanding that many of these are redundant counts, because the pitchers who rank highly in one fiveday rating period almost always rank highly in the NEXT fiveday rating period as well. We will call these the "follow up" games.
We don’t quite have 141,200 follow up games, because not everybody makes 10 more starts before their career ends or we hit the end of the study period or something. We have 139,888 follow up games (for pitchers rated 1 through 10).
The Average Game score, for those 139,888 games by the pitchers rated 1 through 10 (with a 1% weighting for each start) was 56.49. They had 63,020 wins, 44,428 losses for a .587 winning percentage, and they had a 3.36 ERA.
Next, let’s look at the pitchers who ranked 50 to 59^{th} on the list. There are also 14,120 of these pitchers. These 14,120 pitchers had 136,885 follow up games. THEIR average game score was 50.32, their winning percentage was .502 (49,83249,457) and their ERA was 4.12.
Finally, we look at the pitchers who ranked in spots 96 to 105 in each of the 1,412 study periods—another 14,120 pitchers. These pitchers had 125,599 followup games, with a .450 winning percentage (39,36548,163), and a 4.51 ERA.
So we have what a businessman would call proof of concept. The data shakes out the way it SHOULD shake out. The highlyrated pitchers do in fact pitch much better, over their next ten starts, than the lowerrating pitchers. In the study, six pitchers had 10start stretches in which they went 100: Roger Clemens, Bob Gibson, John Smoltz, Gaylord Perry, Justin Verlander and Brandon Webb. No one went 010; Hideo Nomo went 09 with an 8.60 ERA.
Now, having established that the process works, we can then compare whether it works better or worse if we increase the weight given to each start.
First, I compared the two ends of the chart—1% per start, and 6.5% per start.
It turns out, given this structure, that a 6.5% weighting works better than a 1% weighting. Again, cautionary note: this may not conflict directly with Tom Tango’s position, given his method. The heavier weighting works better, in my system, because it moves deserving pitchers up through the rankings more rapidly than a 1% weighting. But if you don’t start everybody out at the bottom of the scale, then you may not have that effect, so you might do better with a 1% weighting, I don’t know. (I’ll amend this comment later.)
But in THIS system. . .as I said, pitchers ranked 110 with a 1% weighting have an average Game Score for their next ten starts of 56.49, a winning percentage of .587, and an ERA of 3.36. But when we increase the weighting to 6.5%, then the average Game Score increases to 57.38, the winning percentage increases to .592, and the ERA drops to 3.28.
On the other end of the scale. . .not the EXTREME other end of the scale, because the extreme other end of the scale might be 132 pitchers or 162 or some other weird number. . .but the pitchers ranked 96 to 105, with the 1% weighting, have an average Game Score of 48.11, a winning percentage of .450, and an ERA of 4.51. But with the 6.5 weighting, the average Game Score drops to 47.28, the winning percentage drops to .443, and the ERA increases to 4.64. The 6.5% weighting does a better job of predicting future performance than does the 1% weighting.
1.5% is more effective at predicting future performance than 1%. 1.5% (as opposed to 1%) increases the average Game Score from 56.49 to 57.11, and the winning percentage from .587 to .592. The ERA drops from 3.36 to 3.30. Parallel changes on the other end of the scale. . .maybe I shouldn’t write out all of those.
2% is as effective as 6.5% at sorting out the top pitchers. With a 2% weight for each game, the pitchers ranked 110 have an average Game Score during their next ten starts of 57.35, a winning percentage of .593, and an ERA of 3.27—basically the same data as the 6.5% weighting.
2.5% is. . .well, the data is a little bit mixed, but it appears to be a tiny bit more effective than 2%. I’ll chart the data for you in a moment. The average Game Score in the next ten starts increases from 57.34 to 57.44, and the ERA drops from 3.27 to 3.26, although the Winning Percentage also drops.
3% is a tiny bit more effective than 2.5%.
3.5% is almost indistinguishable from 3%. . ..just a tiny, tiny, tiny bit worse.
4% is indistinguishable from 3.5%.
4.5% looks the same as 4%.
OK, It’s 7:00 in the morning, I’ve been up all night, I’m tired and I need to get to bed, so I’m not going to run the data for 5%, 5.5% or 6%. I am almost embarrassed to admit this, but it appears that my instinct—3%may have been as good a place to put the correct weight as any other—not necessarily BETTER than other options, certainly not markedly better, but as good as any.
Here’s a chart of the data from the study:




Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

1%

1 to 10

14120

139888

7902691

56.49

63020

44428

.587

1%

50 to 59

14120

136885

6888067

50.32

49832

49457

.502

1%

96 to 105

14120

125599

6042409

48.11

39365

48163

.450























Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

1.50%

1 to 10

14120

140300

8011761

57.10

64074

44175

.592

1.50%

50 to 59

14120

137777

6892017

50.02

49559

50349

.496

1.50%

96 to 105

14120

124878

5994503

48.00

38993

47805

.449























Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

2%

1 to 10

14120

140428

8053345

57.35

64489

44288

.593

2%

50 to 59

14120

137841

6876234

49.89

49392

50862

.493

2%

96 to 105

14120

124188

5954408

47.95

38746

47432

.450























Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

2.50%

1 to 10

14120

140500

8070611

57.44

64507

44580

.591

2.50%

50 to 59

14120

138405

6912131

49.94

50010

50975

.495

2.50%

96 to 105

14120

123391

5901065

47.82

38558

47101

.450
































Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

3%

1 to 10

14120

140525

8078235

57.49

64674

44537

.592

3%

50 to 59

14120

138626

6926620

49.97

50009

50916

.496

3%

96 to 105

14120

123398

5895252

47.77

38616

47226

.450























Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

3.50%

1 to 10

14120

140526

8077707

57.48

64696

44556

.592

3.50%

50 to 59

14120

138522

6904902

49.85

49558

51364

.491

3.50%

96 to 105

14120

123107

5877328

47.74

38406

47025

.450














Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

4%

1 to 10

14120

140491

8075853

57.48

64716

44580

.592

4%

50 to 59

14120

138447

6910182

49.91

49921

51130

.494

4%

96 to 105

14120

123082

5875938

47.74

38339

46935

.450























Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

4.50%

1 to 10

14120

140469

8076205

57.49

64691

44581

.592

4.50%

50 to 59

14120

138442

6913713

49.94

49886

50956

.495

4.50%

96 to 105

14120

123300

5890179

47.77

38445

46932

.450














Total

Average







Followup

Game

Game



Winning



Pitchers

Games

Score

Score

Wins

Losses

Percentage

6.50%

1 to 10

14120

140380

8055387

57.38

64556

44449

.592

6.50%

50 to 59

14120

138493

6939193

50.11

50193

50824

.497

6.50%

96 to 105

14120

119568

5653119

47.28

36743

46183

.443









Earned




Outs

Runs

ERA

1%

1 to 10

2884593

359034

3.36

1%

50 to 59

2547200

388793

4.12

1%

96 to 105

2176415

363927

4.51



















Earned




Outs

Runs

ERA

1.50%

1 to 10

2920220

356199

3.29

1.50%

50 to 59

2552887

393337

4.16

1.50%

96 to 105

2157562

361694

4.53



















Earned




Outs

Runs

ERA

2%

1 to 10

2933198

355277

3.27

2%

50 to 59

2548823

395337

4.19

2%

96 to 105

2141380

359162

4.53



















Earned




Outs

Runs

ERA

2.50%

1 to 10

2939608

355180

3.26

2.50%

50 to 59

2565213

397254

4.18

2.50%

96 to 105

2123889

357815

4.55
























Earned




Outs

Runs

ERA

3%

1 to 10

2942246

355158

3.26

3%

50 to 59

2568881

397638

4.18

3%

96 to 105

2123755

358822

4.56



















Earned




Outs

Runs

ERA

3.50%

1 to 10

2942569

355362

3.26

3.50%

50 to 59

2562431

399032

4.20

3.50%

96 to 105

2116249

357931

4.57














Earned




Outs

Runs

ERA

4%

1 to 10

2943089

355572

3.26

4%

50 to 59

2564417

397879

4.19

4%

96 to 105

2116549

358093

4.57



















Earned




Outs

Runs

ERA

4.50%

1 to 10

2943181

355529

3.26

4.50%

50 to 59

2564429

397245

4.18

4.50%

96 to 105

2122500

358410

4.56














Earned




Outs

Runs

ERA

6.50%

1 to 10

2935716

356274

3.28

6.50%

50 to 59

2569612

394443

4.14

6.50%

96 to 105

2048380

352094

4.64

The one conclusion that I can firmly reach after doing this is that 1% or 1.5% is clearly not the right answer—and I SUSPECT, although I can’t prove this, that it isn’t the right answer in Tom’s method, either. The reason that 1% is not the right weight is that it fails to pick up on pitcher’s loss in effectiveness near the end of their careers.
"Near the end of their careers" is a fraught term. It sounds like we are talking about Roger Clemens in 2004, Chuck Finley in 2002, Kevin Appier in 2002, Jack Morris in 1993, etc., and yes, we are talking about them. Almost every outstanding pitcher has a moment at the end of his career in which (a) he has not been pitching well, and (b) he is going to get really hammered during his next ten starts, but (c) he would still rank as one of the ten top pitchers in baseball at that moment, based on a 1% weighting scale, because the weight isn’t enough to pick up what is happening.
But it is not JUST that moment. There is a Sonny Gray/Jose Quintana moment that happens much earlier to many pitchers. Jose Quintana is not old and has not yet had a "great" career, but he is also not pitching well at the moment, either. A lot of guys hit the end of the road when they are 26, 27, 28, 29 years old. A 1% weighting system isn’t going to see that when it is happening.
Other than that. . .you can use 2%, 3%, anything up to 6%, and one predicts the future about as well as another.