Remember me

Recent Game vs. Full Season Performance for Pitchers

September 1, 2015
  

Recent Game Vs. Full Season Performance For Starting Pitchers

 

Hey, Bill:  If you were choosing a pitcher to start a critical game, say a one-game playoff or game 7 of a series, would you be more likely to choose the pitcher on your team who has the highest pitcher score or the pitcher who has pitched best over the past four or five starts?  Or would you use other criteria or a combination of criteria?  --Flying Fish

 

               The general principle is that more information is usually better than less, but I decided to study the specific question.    I took my data base of games, 1950 to 2014, and looked at the question of how well one game by a starting pitching predicts the next, how well the previous TWO games predict the next, three games, four games, etc., up to 30 games.   To begin with, I took all game lines in the data—a little more than 240,000 lines—and marked each by the career start number.  Then I eliminated the first 30 starts from each pitcher’s career, so that pitchers who hadn’t been around for a full season wouldn’t be included in the study, since they aren’t relevant to the Fish’s question.   This left 179,844 game lines in the data. 

               Sort those, first, by the pitcher’s performance (Game Score) in his previous start.   In the top 10% there are 17,984 games.    The pitchers who had pitched best in their previous starts went 7,486-6,467 with a 3.55 ERA, while those who had pitched worst in the previous start went 6,328-6,680 with a 4.23 ERA:

 

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

A1

17984

7486

6467

.537

54.41

121970.2

115123

37427

83635

3.55

B1

17984

7156

6369

.529

52.71

117362.2

114433

36767

77687

3.77

C1

17984

6977

6350

.524

52.30

116155.2

113245

36995

75307

3.84

D1

17985

6949

6449

.519

51.70

114741.0

113241

37350

73653

3.92

E1

17985

6743

6512

.509

51.11

113219.2

112942

37565

70972

3.99

F1

17985

6587

6591

.500

50.60

112008.1

112979

38008

70029

4.08

G1

17985

6540

6620

.497

50.71

111756.1

112287

37516

69249

4.05

H1

17984

6389

6646

.490

50.17

110327.0

111832

37987

67241

4.13

I1

17984

6305

6743

.483

49.91

109769.1

111560

38259

65320

4.17

J1

17984

6328

6680

.486

49.62

109961.0

112767

38240

66426

4.23

 

               Group A1 is the pitchers who had pitched best (Group A) in their previous ONE (1) start; A2 is those who had pitched best in their previous two starts, etc.  I have more data than this, but the additional data doesn’t fit.  The "A1" group pitchers pitched 4,406 complete games including 1,049 shutouts, and the won-lost record of their TEAMS, as opposed to the pitchers themselves, was 9,589-8,376 (19 ties).  

               Anyway, we can see that the previous start predicts the next one to a limited extent, and in the next chart we can see that the extent to which previous performance predicts next-start performance increases when we use the previous two starts, rather than one:

 

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

A2

17984

7781

6243

.555

55.30

123183.2

114253

36920

87266

3.44

B2

17984

7287

6395

.533

53.14

118501.2

114329

37111

78824

3.71

C2

17984

7164

6272

.533

52.67

116776.1

112992

37085

76150

3.77

D2

17985

6816

6436

.514

51.62

114422.1

113256

37241

72794

3.92

E2

17985

6608

6712

.496

50.92

113539.2

114229

37603

70928

4.03

F2

17985

6686

6531

.506

51.01

112891.1

112852

37627

69890

3.99

G2

17985

6500

6627

.495

50.14

111236.1

113010

38075

68153

4.16

H2

17984

6401

6785

.485

50.08

110421.1

111862

38094

66765

4.15

I2

17984

6188

6705

.480

49.52

108860.2

111736

38115

65035

4.24

J2

17984

6029

6721

.473

48.84

107438.1

111890

38243

63714

4.36

 

               A GS is "Average Game Score".  The predictive power of the previous games increases again when we go from two starts to three:

 

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

A3

17984

7869

6204

.559

55.86

124107.1

113545

37284

89691

3.38

B3

17984

7390

6255

.542

53.61

119265.1

113821

37350

80069

3.64

C3

17984

6958

6495

.517

52.40

116823.0

114391

37156

76130

3.81

D3

17985

7045

6397

.524

51.99

115486.1

113406

37210

73051

3.85

E3

17985

6777

6512

.510

51.21

113605.0

113380

37277

71642

3.98

F3

17985

6612

6571

.502

50.75

112537.1

113134

37376

69182

4.05

G3

17985

6386

6677

.489

50.28

111315.0

112666

37867

67623

4.12

H3

17984

6369

6648

.489

49.88

109982.1

112125

38208

66094

4.17

I3

17984

6229

6771

.479

49.11

108340.2

112354

38153

63935

4.30

J3

17984

5825

6897

.458

48.14

105809.1

111587

38233

62102

4.50

 

               Whereas the 10% of starting pitchers who had pitched well in their previous ONE start had a .537 winning percentage and a 3.55 ERA, the 10% of pitchers who had pitched well in their previous THREE starts had a .559 winning percentage and a 3.38 ERA, and whereas the 10% of pitchers who had pitched most badly in their previous one start had a .486 winning percentage and a 4.23 ERA, those who had pitched most poorly in their previous three starts had a .458 winning percentage and a 4.50 ERA.  More information is better; the predictive power increases.  

               This continues to be true as we add more starts—up to at least 15 starts.   Since really only the "A" group and the "J" group are interesting, I will trim the charts to just those:

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

A1

17984

7486

6467

.537

54.41

121970.2

115123

37427

83635

3.55

A2

17984

7781

6243

.555

55.30

123183.2

114253

36920

87266

3.44

A3

17984

7869

6204

.559

55.86

124107.1

113545

37284

89691

3.38

A4

17984

8014

6035

.570

56.41

125072.0

113309

37386

91922

3.31

A5

17984

8061

6022

.572

56.77

125538.0

112768

37321

93237

3.27

A6

17984

8114

6000

.575

57.01

126016.1

112622

37200

94439

3.25

A7

17984

8170

5929

.579

57.09

126194.1

112588

37416

95276

3.25

A8

17984

8124

5941

.578

57.20

126371.0

112532

37477

95882

3.23

A9

17984

8186

5934

.580

57.37

126682.0

112420

37635

96889

3.22

A 10

17984

8254

5938

.582

57.49

126822.0

112245

37546

97149

3.20

A 11

17984

8264

5908

.583

57.73

127260.2

112098

37690

97819

3.17

A 12

17984

8295

5923

.583

57.80

127486.2

112287

37584

98498

3.16

A 13

17984

8295

5914

.584

57.84

127482.0

112197

37622

98673

3.16

A 14

17984

8357

5902

.586

58.00

127778.0

112032

37652

99013

3.14

A 15

17984

8382

5887

.587

58.05

127938.1

111912

37835

99256

3.14

 

               In this chart, then, we can see that the predictive power of the previous starts increases when more starts are considered; that is, the winning percentage of the "best" pitchers improves, and the ERA declines, as more starts are considered, up to 15.    Also, when we look at the worst pitchers, they continue to get worse:

 

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

J1

17984

6328

6680

.486

49.62

109961.0

112767

38240

66426

4.23

J2

17984

6029

6721

.473

48.84

107438.1

111890

38243

63714

4.36

J3

17984

5825

6897

.458

48.14

105809.1

111587

38233

62102

4.50

J4

17984

5803

6943

.455

47.86

105401.1

112056

38155

60987

4.54

J5

17984

5778

6896

.456

47.56

104591.2

111897

38231

60107

4.60

J6

17984

5730

6924

.453

47.35

104018.0

111640

38230

59474

4.64

J7

17984

5712

6970

.450

47.16

103810.1

112012

38418

58909

4.67

J8

17984

5640

7022

.445

46.98

103305.2

111984

38149

58341

4.71

J9

17984

5552

7024

.441

46.78

103156.0

112183

38484

58200

4.76

J 10

17984

5536

7112

.438

46.73

103181.2

112333

38342

57968

4.76

J 11

17984

5559

7064

.440

46.69

103097.0

112483

38366

57902

4.76

J 12

17984

5534

7125

.437

46.56

102845.2

112468

38373

57762

4.80

J 13

17984

5503

7123

.436

46.51

102689.2

112374

38380

57411

4.81

J 14

17984

5528

7088

.438

46.42

102539.0

112545

38428

57449

4.83

J 15

17984

5556

7122

.438

46.31

102351.1

112764

38354

57250

4.85

 

               The predictive power of the last six starts is twice the predictive power of the last one start, but the predictive power of the last 15 starts is only 20% greater than the predictive power of the last six starts.    The predictive value of each additional start is less than the predictive value of the previous one. 

               After 15 to 20 starts, the charts flatten out to such an extent that it is difficult to say with confidence that any additional gains are being made.    There are two other things that are happening, beyond the natural law of diminishing returns.   Since a pitcher only makes about 30 starts in a season, after 15 starts about half of the "old" starts we are adding into the data are from the previous season.   It is likely that last year’s data has less predictive value than this year’s data, even if this year’s data was three months ago.   

               Also, there is a technical issue with using the Game Score, unadjusted, as the indicator of how well the pitcher has pitched, since the Average Game Score by a pitcher is higher in 1968 than in the steroid era.    This is a small effect, and it isn’t an actual problem in the part of the study where we are measuring noticeable effects, but as the effects being measured grow smaller, the technical issue becomes more significant relevant to the effects being measured.   As a consequence of these things, after 15 starts some measures go one way and some go another, and it is unclear whether we’re actually gaining any more useful information or not.  This chart compares the data for the last five starts compared to the last ten, 15, 20, 25 or 30:

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

A5

17984

8061

6022

.572

56.77

125538.0

112768

37321

93237

3.27

A 10

17984

8254

5938

.582

57.49

126822.0

112245

37546

97149

3.20

A 15

17984

8382

5887

.587

58.05

127938.1

111912

37835

99256

3.14

A 20

17984

8338

5858

.587

58.15

127927.1

111756

37984

100165

3.12

A 25

17984

8329

5937

.584

58.19

128074.2

111867

37721

100592

3.13

A 30

17984

8393

5917

.587

58.31

128204.1

111756

37738

100877

3.11

 

                 

 

Group

Games

Won

Lost

WPct

A GS

IP

H

BB

K

ERA

J5

17984

5778

6896

.456

47.56

104591.2

111897

38231

60107

4.60

J 10

17984

5536

7112

.438

46.73

103181.2

112333

38342

57968

4.76

J 15

17984

5556

7122

.438

46.31

102351.1

112764

38354

57250

4.85

J 20

17984

5552

7111

.438

46.43

102754.2

112716

38087

57482

4.83

J 25

17984

5522

7034

.440

46.27

102351.0

112901

38091

57087

4.86

J 30

17984

5501

7060

.438

46.19

102273.1

112943

38122

56993

4.87

 

               Probably the data continues to gain predictive significance after 15+ starts have passed, but the gains are very small and we cannot be certain that they are real.  

               So the answer to your question, at this point, is that the last year’s data is a better predictor of pitcher performance than the last five games, but that one should not ignore the last five games, either.   Suppose that a pitcher has pitched moderately well over the last 30 games, but extremely poorly over the last five?   Then the short-term effects might be more important than the longer-term effects, and perhaps the answer is that one should go with the hot hand. 

               Or not.  

               I did one more study.   Suppose that we compare a pitcher who has been pitching well lately but has not pitched well over a longer term with a pitcher who has pitched well over his last 30 starts, but has pitched poorly over his last five starts?

               I figured for each pitcher his G5 – G30; that is, his average Game Score over his last 5 starts minus his average Game Score over his last 30 starts, but with the additional qualification that to be in the top group the pitcher must have genuinely pitched poorly over his last 30 starts, and to be in the bottom group the pitcher must have genuinely pitched poorly over his last 5 starts.  

               Conclusion?   You want the pitcher who has pitched well over his last 30 starts—absolutely and without question.   I looked at the 1000 most extreme examples on each end. 

               In the top group were, for example, Marty Pattin in his start of May 8, 1973.  In his previous five starts (April 16 to May 4, 1973) he had lost all five, had pitched less than four innings per start, and had an ERA of 11.17.    But in his last 30 starts (June 20, 1972 to May 4, 1973) he had pitched 220 innings and had gone 16-11 with a 3.07 ERA.   Pattin pitched very well in that game, although he lost the game 1-0.

               Second example:  Gaylord Perry in his start of August 15, 1974.  In his previous five starts he had pretty much been pounded every time, giving up a total of 44 hits and 31 runs in 38 and a third innings, and losing all five starts.   But in his previous 30 starts he was 20-7 with a 2.27 ERA, meaning that in the 25 starts BEFORE the last five he was 20-2 with an ERA under 1.50.   Gaylord pitched well in his start of August 15, and won the game.

               On the other end we have, for example, Brian Bohannon in his start of September 15, 1999.   In his previous five starts he had pitched 8 innings, 9 innings, 7 innings, 8 innings and 8 innings, and his ERA for those five starts was 2.25, with 32 strikeouts and 12 walks in 40 innings.   But over his last 30 starts, although he was 12-11, he had pitched 182 innings, struck out 110, walked 82, and had a 5.79 ERA.  

               Bohannon was hit hard in his start of September 15—and the next one, and the next one, and the next one, and the next one.    He was Brian Bohannon; he was what he was.   It’s baseball.  The cream doesn’t ALWAYS rise to the top, but the sand always falls to the bottom.  

               Comparing 1,000 pitchers in each group, the pitchers who had pitched poorly in their last 5 starts but had pitched well over their last 30 starts had a Winning Percentage in their next start of .538, and an ERA of 3.40.   The pitchers in the opposite group had a Winning Percentage of .438, and an ERA of 4.70.  

 

 

 
 

COMMENTS (16 Comments, most recent shown first)

evanecurb
I had wondered how Gaylord Perry managed to lose 16 games during that 1974 season. Now I know. If I remember correctly, he had a lengthy winning streak that year in mid summer: 14 or 15 in a row, something like that. For Cleveland, no less.
9:54 AM Sep 5th
 
evanecurb
Whenever I think about managers selecting pitchers to start big games, I am reminded of an interview with Casey Stengel a few years before he passed away. They asked him why he started Johnny Kucks in the seventh game of the 1956 World Series, and he said (as only Casey could) "Well the fella was from New Jersey and I'll tell ya how they celebrated. They just put the keg of beer on the sidewalk and danced in the streets."

So now you know how Casey did it.
9:03 PM Sep 4th
 
bjames
It's been a few years and I'm not sure how relevant it is, but I once did a study of pitchers with losing records who started a World Series game. You know this has been a few years, because nobody would do that study now; now you'd study all post-season games, and you probably wouldn't use the won-lost record. Anyway, the conclusion was; they get murdered. You look at pitchers with losing records who wind up starting a World Series game . ..BAD results. I would assume that most or many of those pitchers started a World Series game because they were perceived to be pitching well recently.
11:36 AM Sep 3rd
 
flyingfish
MarisFan61: I am not sure I agree with your last comment. How often do you hear managers say "Well, I'm going with the guy who has the hot hand" when discussing how to choose the rotation for a critical series? I hear it a LOT.

As for Zack Greinke, I wish he were pitching for my team. :)
6:17 PM Sep 2nd
 
MarisFan61
I thought it would be good to try to add, as a point of reference for the article, what it is that teams have actually been doing in such situations. I think probably the way such actual decisions have been made for such post-season games mostly follows what Bill finds, but with some role of things like what some of us have brought up.

By far for the most part, teams do follow longer-term success. In particular, if they have a guy who they've considered for a while to be their "ace," it's not easy to move him off the chosen spot. Absent that, there's a strong drive toward picking a guy who's been doing it all year.

But, there are exceptions. In line with what Bill finds, having a few bad games near the end of the season wouldn't generally take a guy off the spot. But, if it felt there's something physically wrong with him, whether injury or tired arm or whatever it might be, or even if he "hasn't been looking good" (whatever that might mean) in those games, he could be bumped or even removed.

So, I think this study actually supports what's usually done. Does it argue against the occasional exceptions? I think not particularly. It says that in general teams are best off to follow longer-term success. I think that's what they generally do, and when there are exceptions, I think it's a case-by-case kind of thing that might be hard to study systematically.​
1:03 PM Sep 2nd
 
MarisFan61
Thanks for addressing it so specifically -- you really did. I would add -- and I wish I could put this in tiny print, like we can do in Reader Posts, because it's probably only worth a footnote if anything, that I see an additional quibble. Rather than taking up any further space and rhetoric here, I'll put it in a "Hey Bill" to see if you think it's anything.

I love the Greinke example because it shows how there can always be a "yes but" about such things, although we can gather from this study that there are probably fewer "yes buts" than might have been thought.
11:27 PM Sep 1st
 
bjames
And, of course, one can sometimes "see" changes in a pitcher's skill level without relying on the stats. Zack Greinke in 2008 was kind of a .500 pitcher most of the year, but I moved back from Boston to KC in August, 2008, and went to a game in KC. I happened to have really good seats behind the first base dugout, and after the game I wrote back to the guys in Boston that Greinke had become the best pitcher in baseball, which he demonstrated the next season that he was. You could just see it. . .he was throwing 96 with no effort and had A+ command of every pitch. I remember a similar game that Dave Stewart pitched in KC (against KC) in late 1984; Dave Stewart in 1984 was 7-14 with a 4.73 ERA, but I remember thinking "I don't know what's up with the record, but that was one hell of a pitcher I saw tonight." He actually didn't get it turned around until mid-season 1986, but it was always there.
11:04 PM Sep 1st
 
bjames
Responding to the late-season concern as best I can, I took a copy of the data and eliminated from it everything except September and October starts. Since there were 179,000+ data lines in the full-season study, this left us with 31,105 data lines in the September/October version of the study, or 3,110/3,111 data lines in each 10% "file" such as A1, A2, B1, J30, etc. This wasn't difficult to do; all I had to do was eliminate the pre-September starts and re-calculate the results; just took me 30 to 45 minutes.

You may think that 3,000 data lines is a good number, but you certainly do get some instability in the data, with 3,000 lines in each file, that you would not get with 18,000 in each file. There are also two other changes noted: 1) that all of the ERAs are lower and the average game scores higher, since September tends to be a pitcher's month, and 2) that there is more decentralization in the data--that is, the top and bottom move away from further away from one another--due to the "September decentralization." This time of year, a few teams just more or less quit.

Aside from those effects, the data patterns in the September study are the same as noted in the larger study. The data flattens out after about 15 starts; however, the pitchers who have been good over the last 30 starts are substantially better in their next start than those who have been good over their last 4 or 5 starts, and 30 starts is probably (those less clearly) a better "read" on the pitcher's current effectiveness than is 15 starts.

Of course pitcher's levels of effectiveness sometimes change, particularly young pitchers. Randy Johnson in 1995 is not the same as Randy Johnson in 1992; Roy Halladay in 2001 was not at all the same as Roy Halladay in 2000, and Zack Greinke in 2005 was 5-17 with a 5.80 ERA. It is a question of what you bet on. You don't bet on the last 4-5 starts; you bet on the last 15 or the last 30.
10:43 PM Sep 1st
 
flyingfish
Sorry, you actually said more information is USUALLY better than less. I need to read more carefully.​
6:52 PM Sep 1st
 
flyingfish
Thanks for answering my question, Bill. Very interesting indeed. As you said in the beginning of your article, more information always is better than less, and so I think when we get to the way MarisFan61 rephrased my question to focus on the postseason, or indeed the question as I originally phrased it, we do have additional information. For example, Joe Kelly of the Red Sox has not been very good overall this season but his last several (5?) starts have been very good. Will he revert to the old Joe Kelly? Well, we know that he says that in his recent trip to AAA, he LEARNED TO PITCH DIFFERENTLY. Other players, both teammates and opponents, have said he looks like a better pitcher recently. I'm not quite sure just how much to make of that information, but it certainly helps to some degree; you wouldn't be using only his statistical record because you know some other, relevant, things. Similarly with a pitcher who is pitching through a seemingly minor injury and then ends up on the disabled list (Steven Strasburg comes to mind, more than once). If HE starts pitching badly, then I'm going to need to find out if he's hurt. Thanks again.
6:08 PM Sep 1st
 
DanaKing
As always, lots of interesting stuff here. Thanks for taking the time on this, Bill.

To me, what this shows more than anything is that, while recent stats can clearly show if a player is hot or cold, it's almost impossible to show whether said player will remain hot or cold in his next game. There are two many variables, not the least of which is human frailty. (Some days or weeks people just feel better or worse physically, and it affects their performance.)
12:52 PM Sep 1st
 
stevekohlhagen
great analysis, bill.

however, as we all know, no matter how much you show objective evidence that forecasting future performance off of someone's subjective notion of who's hot and who's not doesn't work, some people will resort to emotional claims that THEY can. despite all evidence to the contrary.

human nature i guess.

great analysis. thanks. swk
12:30 PM Sep 1st
 
78sman
Bill, this is great information.

MarisFan, your point is good too. Data do not always tell us why a pitcher performed well or poorly in recent starts, so Bill's analysis did not account for reasons why a pitcher had done well or poorly in recent starts. If Bill had been able to separate out pitchers who had an important nagging injury or a dead arm, then one would expect different results for that sub-group.

I love Bill's analysis, and I think that MarisFan has added an important caveat.
11:18 AM Sep 1st
 
MarisFan61
P.S. Interestingly, while in general it's no problem for such a study to ignore pitchers who haven't been around for a full season and to eliminate the first 30 starts of a pitcher's career from consideration, if we're looking most particularly at the coming post-season -- which I do think was the main thrust of the question -- there is indeed at least one such pitcher who may be a prominent part in such a choice: Luis Severino of the Yankees.
11:14 AM Sep 1st
 
MarisFan61
Yes, but..... :-)
I think this way of doing the study might be sort drowning the specific thing we're looking for. I don't know that it makes a difference but it's not hard to imagine that it might.

It depends on what we're really looking for, and that depends on how we see the original question:

If you were choosing a pitcher to start a critical game, say a one-game playoff or game 7 of a series, would you be more likely to choose the pitcher on your team who has the highest pitcher score or the pitcher who has pitched best over the past four or five starts? Or would you use other criteria or a combination of criteria?

Bill, it seems you took the question broadly: emphasizing the first phrase in the question, and taking the next part just as a fr'instance. I'd guess his emphasis was the other way around: that he was really talking mainly about the post-season and that the first phrase was just a lead-in, since it's a question that 'in the air' a lot right now, because of the time of year, and what people almost always mean is the post-season.

What you looked at does answer the broad question. But if the emphasis was the other thing, I think it might well make a difference that we're talking about the end part of the season.

As I said, I don't know that it makes a difference but I can imagine it might: Pitchers are more likely to have tired arms toward the end of the season, and some pitchers do and some don't; obviously the ones who don't have tired arms have an advantage; and it may well lead to different correlations than looking at performance at all times of the season.

If one agrees with this and wanted to do a similar study looking at the question in this narrower way, I think there would be traps. It might seem like the most direct way would be to look at post-season performances by starting pitchers and seeing how they correlate to prior groups of their games. But, that would be fraught, because the way pitchers are and aren't used in the post-season takes into account how their arms are. Pitchers who have tired arms or sore arms would tend either not to have been used in that post-season or to have been given extra rest. Those things would tilt the data away from answering the specific thing that I think we're looking at.

I guess the point in what I'm saying is that the answer depends on why a pitcher with a good longer-term record has done worse lately, and that when such a thing happens late in the season, it may be more likely than usual (for any pitcher, good or bad long-term record) that it's related to his arm being compromised. For pitchers whose arms aren't particularly compromised, I'd say the answer is covered very well by this study. Otherwise, it's a different story. And actually in a way this is just a "d'oh" thing, because probably anybody would say "Of course you don't start a pitcher with a bad arm."

OK, now I better run and hide before being trampled.... :-)
10:59 AM Sep 1st
 
mrkwst22
....and that ladies and gentlemen is why Bill James is a National Treasure.
8:33 AM Sep 1st
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy