Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Recent Game vs. Full Season Performance for Pitchers

By Bill James

September 1, 2015

Recent Game Vs. Full Season Performance For Starting Pitchers

Hey, Bill: If you were choosing a pitcher to start a critical game, say a one-game playoff or game 7 of a series, would you be more likely to choose the pitcher on your team who has the highest pitcher score or the pitcher who has pitched best over the past four or five starts? Or would you use other criteria or a combination of criteria? --Flying Fish

The general principle is that more information is usually better than less, but I decided to study the specific question. I took my data base of games, 1950 to 2014, and looked at the question of how well one game by a starting pitching predicts the next, how well the previous TWO games predict the next, three games, four games, etc., up to 30 games. To begin with, I took all game lines in the data—a little more than 240,000 lines—and marked each by the career start number. Then I eliminated the first 30 starts from each pitcher’s career, so that pitchers who hadn’t been around for a full season wouldn’t be included in the study, since they aren’t relevant to the Fish’s question. This left 179,844 game lines in the data.

Sort those, first, by the pitcher’s performance (Game Score) in his previous start. In the top 10% there are 17,984 games. The pitchers who had pitched best in their previous starts went 7,486-6,467 with a 3.55 ERA, while those who had pitched worst in the previous start went 6,328-6,680 with a 4.23 ERA:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
A1	17984	7486	6467	.537	54.41	121970.2	115123	37427	83635	3.55
B1	17984	7156	6369	.529	52.71	117362.2	114433	36767	77687	3.77
C1	17984	6977	6350	.524	52.30	116155.2	113245	36995	75307	3.84
D1	17985	6949	6449	.519	51.70	114741.0	113241	37350	73653	3.92
E1	17985	6743	6512	.509	51.11	113219.2	112942	37565	70972	3.99
F1	17985	6587	6591	.500	50.60	112008.1	112979	38008	70029	4.08
G1	17985	6540	6620	.497	50.71	111756.1	112287	37516	69249	4.05
H1	17984	6389	6646	.490	50.17	110327.0	111832	37987	67241	4.13
I1	17984	6305	6743	.483	49.91	109769.1	111560	38259	65320	4.17
J1	17984	6328	6680	.486	49.62	109961.0	112767	38240	66426	4.23

Group A1 is the pitchers who had pitched best (Group A) in their previous ONE (1) start; A2 is those who had pitched best in their previous two starts, etc. I have more data than this, but the additional data doesn’t fit. The "A1" group pitchers pitched 4,406 complete games including 1,049 shutouts, and the won-lost record of their TEAMS, as opposed to the pitchers themselves, was 9,589-8,376 (19 ties).

Anyway, we can see that the previous start predicts the next one to a limited extent, and in the next chart we can see that the extent to which previous performance predicts next-start performance increases when we use the previous two starts, rather than one:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
A2	17984	7781	6243	.555	55.30	123183.2	114253	36920	87266	3.44
B2	17984	7287	6395	.533	53.14	118501.2	114329	37111	78824	3.71
C2	17984	7164	6272	.533	52.67	116776.1	112992	37085	76150	3.77
D2	17985	6816	6436	.514	51.62	114422.1	113256	37241	72794	3.92
E2	17985	6608	6712	.496	50.92	113539.2	114229	37603	70928	4.03
F2	17985	6686	6531	.506	51.01	112891.1	112852	37627	69890	3.99
G2	17985	6500	6627	.495	50.14	111236.1	113010	38075	68153	4.16
H2	17984	6401	6785	.485	50.08	110421.1	111862	38094	66765	4.15
I2	17984	6188	6705	.480	49.52	108860.2	111736	38115	65035	4.24
J2	17984	6029	6721	.473	48.84	107438.1	111890	38243	63714	4.36

A GS is "Average Game Score". The predictive power of the previous games increases again when we go from two starts to three:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
A3	17984	7869	6204	.559	55.86	124107.1	113545	37284	89691	3.38
B3	17984	7390	6255	.542	53.61	119265.1	113821	37350	80069	3.64
C3	17984	6958	6495	.517	52.40	116823.0	114391	37156	76130	3.81
D3	17985	7045	6397	.524	51.99	115486.1	113406	37210	73051	3.85
E3	17985	6777	6512	.510	51.21	113605.0	113380	37277	71642	3.98
F3	17985	6612	6571	.502	50.75	112537.1	113134	37376	69182	4.05
G3	17985	6386	6677	.489	50.28	111315.0	112666	37867	67623	4.12
H3	17984	6369	6648	.489	49.88	109982.1	112125	38208	66094	4.17
I3	17984	6229	6771	.479	49.11	108340.2	112354	38153	63935	4.30
J3	17984	5825	6897	.458	48.14	105809.1	111587	38233	62102	4.50

Whereas the 10% of starting pitchers who had pitched well in their previous ONE start had a .537 winning percentage and a 3.55 ERA, the 10% of pitchers who had pitched well in their previous THREE starts had a .559 winning percentage and a 3.38 ERA, and whereas the 10% of pitchers who had pitched most badly in their previous one start had a .486 winning percentage and a 4.23 ERA, those who had pitched most poorly in their previous three starts had a .458 winning percentage and a 4.50 ERA. More information is better; the predictive power increases.

This continues to be true as we add more starts—up to at least 15 starts. Since really only the "A" group and the "J" group are interesting, I will trim the charts to just those:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
A1	17984	7486	6467	.537	54.41	121970.2	115123	37427	83635	3.55
A2	17984	7781	6243	.555	55.30	123183.2	114253	36920	87266	3.44
A3	17984	7869	6204	.559	55.86	124107.1	113545	37284	89691	3.38
A4	17984	8014	6035	.570	56.41	125072.0	113309	37386	91922	3.31
A5	17984	8061	6022	.572	56.77	125538.0	112768	37321	93237	3.27
A6	17984	8114	6000	.575	57.01	126016.1	112622	37200	94439	3.25
A7	17984	8170	5929	.579	57.09	126194.1	112588	37416	95276	3.25
A8	17984	8124	5941	.578	57.20	126371.0	112532	37477	95882	3.23
A9	17984	8186	5934	.580	57.37	126682.0	112420	37635	96889	3.22
A 10	17984	8254	5938	.582	57.49	126822.0	112245	37546	97149	3.20
A 11	17984	8264	5908	.583	57.73	127260.2	112098	37690	97819	3.17
A 12	17984	8295	5923	.583	57.80	127486.2	112287	37584	98498	3.16
A 13	17984	8295	5914	.584	57.84	127482.0	112197	37622	98673	3.16
A 14	17984	8357	5902	.586	58.00	127778.0	112032	37652	99013	3.14
A 15	17984	8382	5887	.587	58.05	127938.1	111912	37835	99256	3.14

In this chart, then, we can see that the predictive power of the previous starts increases when more starts are considered; that is, the winning percentage of the "best" pitchers improves, and the ERA declines, as more starts are considered, up to 15. Also, when we look at the worst pitchers, they continue to get worse:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
J1	17984	6328	6680	.486	49.62	109961.0	112767	38240	66426	4.23
J2	17984	6029	6721	.473	48.84	107438.1	111890	38243	63714	4.36
J3	17984	5825	6897	.458	48.14	105809.1	111587	38233	62102	4.50
J4	17984	5803	6943	.455	47.86	105401.1	112056	38155	60987	4.54
J5	17984	5778	6896	.456	47.56	104591.2	111897	38231	60107	4.60
J6	17984	5730	6924	.453	47.35	104018.0	111640	38230	59474	4.64
J7	17984	5712	6970	.450	47.16	103810.1	112012	38418	58909	4.67
J8	17984	5640	7022	.445	46.98	103305.2	111984	38149	58341	4.71
J9	17984	5552	7024	.441	46.78	103156.0	112183	38484	58200	4.76
J 10	17984	5536	7112	.438	46.73	103181.2	112333	38342	57968	4.76
J 11	17984	5559	7064	.440	46.69	103097.0	112483	38366	57902	4.76
J 12	17984	5534	7125	.437	46.56	102845.2	112468	38373	57762	4.80
J 13	17984	5503	7123	.436	46.51	102689.2	112374	38380	57411	4.81
J 14	17984	5528	7088	.438	46.42	102539.0	112545	38428	57449	4.83
J 15	17984	5556	7122	.438	46.31	102351.1	112764	38354	57250	4.85

The predictive power of the last six starts is twice the predictive power of the last one start, but the predictive power of the last 15 starts is only 20% greater than the predictive power of the last six starts. The predictive value of each additional start is less than the predictive value of the previous one.

After 15 to 20 starts, the charts flatten out to such an extent that it is difficult to say with confidence that any additional gains are being made. There are two other things that are happening, beyond the natural law of diminishing returns. Since a pitcher only makes about 30 starts in a season, after 15 starts about half of the "old" starts we are adding into the data are from the previous season. It is likely that last year’s data has less predictive value than this year’s data, even if this year’s data was three months ago.

Also, there is a technical issue with using the Game Score, unadjusted, as the indicator of how well the pitcher has pitched, since the Average Game Score by a pitcher is higher in 1968 than in the steroid era. This is a small effect, and it isn’t an actual problem in the part of the study where we are measuring noticeable effects, but as the effects being measured grow smaller, the technical issue becomes more significant relevant to the effects being measured. As a consequence of these things, after 15 starts some measures go one way and some go another, and it is unclear whether we’re actually gaining any more useful information or not. This chart compares the data for the last five starts compared to the last ten, 15, 20, 25 or 30:

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
A5	17984	8061	6022	.572	56.77	125538.0	112768	37321	93237	3.27
A 10	17984	8254	5938	.582	57.49	126822.0	112245	37546	97149	3.20
A 15	17984	8382	5887	.587	58.05	127938.1	111912	37835	99256	3.14
A 20	17984	8338	5858	.587	58.15	127927.1	111756	37984	100165	3.12
A 25	17984	8329	5937	.584	58.19	128074.2	111867	37721	100592	3.13
A 30	17984	8393	5917	.587	58.31	128204.1	111756	37738	100877	3.11

Group	Games	Won	Lost	WPct	A GS	IP	H	BB	K	ERA
J5	17984	5778	6896	.456	47.56	104591.2	111897	38231	60107	4.60
J 10	17984	5536	7112	.438	46.73	103181.2	112333	38342	57968	4.76
J 15	17984	5556	7122	.438	46.31	102351.1	112764	38354	57250	4.85
J 20	17984	5552	7111	.438	46.43	102754.2	112716	38087	57482	4.83
J 25	17984	5522	7034	.440	46.27	102351.0	112901	38091	57087	4.86
J 30	17984	5501	7060	.438	46.19	102273.1	112943	38122	56993	4.87

Probably the data continues to gain predictive significance after 15+ starts have passed, but the gains are very small and we cannot be certain that they are real.

So the answer to your question, at this point, is that the last year’s data is a better predictor of pitcher performance than the last five games, but that one should not ignore the last five games, either. Suppose that a pitcher has pitched moderately well over the last 30 games, but extremely poorly over the last five? Then the short-term effects might be more important than the longer-term effects, and perhaps the answer is that one should go with the hot hand.

Or not.

I did one more study. Suppose that we compare a pitcher who has been pitching well lately but has not pitched well over a longer term with a pitcher who has pitched well over his last 30 starts, but has pitched poorly over his last five starts?

I figured for each pitcher his G5 – G30; that is, his average Game Score over his last 5 starts minus his average Game Score over his last 30 starts, but with the additional qualification that to be in the top group the pitcher must have genuinely pitched poorly over his last 30 starts, and to be in the bottom group the pitcher must have genuinely pitched poorly over his last 5 starts.

Conclusion? You want the pitcher who has pitched well over his last 30 starts—absolutely and without question. I looked at the 1000 most extreme examples on each end.

In the top group were, for example, Marty Pattin in his start of May 8, 1973. In his previous five starts (April 16 to May 4, 1973) he had lost all five, had pitched less than four innings per start, and had an ERA of 11.17. But in his last 30 starts (June 20, 1972 to May 4, 1973) he had pitched 220 innings and had gone 16-11 with a 3.07 ERA. Pattin pitched very well in that game, although he lost the game 1-0.

Second example: Gaylord Perry in his start of August 15, 1974. In his previous five starts he had pretty much been pounded every time, giving up a total of 44 hits and 31 runs in 38 and a third innings, and losing all five starts. But in his previous 30 starts he was 20-7 with a 2.27 ERA, meaning that in the 25 starts BEFORE the last five he was 20-2 with an ERA under 1.50. Gaylord pitched well in his start of August 15, and won the game.

On the other end we have, for example, Brian Bohannon in his start of September 15, 1999. In his previous five starts he had pitched 8 innings, 9 innings, 7 innings, 8 innings and 8 innings, and his ERA for those five starts was 2.25, with 32 strikeouts and 12 walks in 40 innings. But over his last 30 starts, although he was 12-11, he had pitched 182 innings, struck out 110, walked 82, and had a 5.79 ERA.

Bohannon was hit hard in his start of September 15—and the next one, and the next one, and the next one, and the next one. He was Brian Bohannon; he was what he was. It’s baseball. The cream doesn’t ALWAYS rise to the top, but the sand always falls to the bottom.

Comparing 1,000 pitchers in each group, the pitchers who had pitched poorly in their last 5 starts but had pitched well over their last 30 starts had a Winning Percentage in their next start of .538, and an ERA of 3.40. The pitchers in the opposite group had a Winning Percentage of .438, and an ERA of 4.70.

COMMENTS (16 Comments, most recent shown first)

evanecurb
I had wondered how Gaylord Perry managed to lose 16 games during that 1974 season. Now I know. If I remember correctly, he had a lengthy winning streak that year in mid summer: 14 or 15 in a row, something like that. For Cleveland, no less.
9:54 AM Sep 5th

evanecurb
Whenever I think about managers selecting pitchers to start big games, I am reminded of an interview with Casey Stengel a few years before he passed away. They asked him why he started Johnny Kucks in the seventh game of the 1956 World Series, and he said (as only Casey could) "Well the fella was from New Jersey and I'll tell ya how they celebrated. They just put the keg of beer on the sidewalk and danced in the streets."

So now you know how Casey did it.
9:03 PM Sep 4th

bjames
It's been a few years and I'm not sure how relevant it is, but I once did a study of pitchers with losing records who started a World Series game. You know this has been a few years, because nobody would do that study now; now you'd study all post-season games, and you probably wouldn't use the won-lost record. Anyway, the conclusion was; they get murdered. You look at pitchers with losing records who wind up starting a World Series game . ..BAD results. I would assume that most or many of those pitchers started a World Series game because they were perceived to be pitching well recently.
11:36 AM Sep 3rd

flyingfish
MarisFan61: I am not sure I agree with your last comment. How often do you hear managers say "Well, I'm going with the guy who has the hot hand" when discussing how to choose the rotation for a critical series? I hear it a LOT.

As for Zack Greinke, I wish he were pitching for my team. :)
6:17 PM Sep 2nd

MarisFan61
I thought it would be good to try to add, as a point of reference for the article, what it is that teams have actually been doing in such situations. I think probably the way such actual decisions have been made for such post-season games mostly follows what Bill finds, but with some role of things like what some of us have brought up.

By far for the most part, teams do follow longer-term success. In particular, if they have a guy who they've considered for a while to be their "ace," it's not easy to move him off the chosen spot. Absent that, there's a strong drive toward picking a guy who's been doing it all year.

But, there are exceptions. In line with what Bill finds, having a few bad games near the end of the season wouldn't generally take a guy off the spot. But, if it felt there's something physically wrong with him, whether injury or tired arm or whatever it might be, or even if he "hasn't been looking good" (whatever that might mean) in those games, he could be bumped or even removed.

So, I think this study actually supports what's usually done. Does it argue against the occasional exceptions? I think not particularly. It says that in general teams are best off to follow longer-term success. I think that's what they generally do, and when there are exceptions, I think it's a case-by-case kind of thing that might be hard to study systematically.
1:03 PM Sep 2nd

MarisFan61
Thanks for addressing it so specifically -- you really did. I would add -- and I wish I could put this in tiny print, like we can do in Reader Posts, because it's probably only worth a footnote if anything, that I see an additional quibble. Rather than taking up any further space and rhetoric here, I'll put it in a "Hey Bill" to see if you think it's anything.

I love the Greinke example because it shows how there can always be a "yes but" about such things, although we can gather from this study that there are probably fewer "yes buts" than might have been thought.
11:27 PM Sep 1st

bjames
And, of course, one can sometimes "see" changes in a pitcher's skill level without relying on the stats. Zack Greinke in 2008 was kind of a .500 pitcher most of the year, but I moved back from Boston to KC in August, 2008, and went to a game in KC. I happened to have really good seats behind the first base dugout, and after the game I wrote back to the guys in Boston that Greinke had become the best pitcher in baseball, which he demonstrated the next season that he was. You could just see it. . .he was throwing 96 with no effort and had A+ command of every pitch. I remember a similar game that Dave Stewart pitched in KC (against KC) in late 1984; Dave Stewart in 1984 was 7-14 with a 4.73 ERA, but I remember thinking "I don't know what's up with the record, but that was one hell of a pitcher I saw tonight." He actually didn't get it turned around until mid-season 1986, but it was always there.
11:04 PM Sep 1st

bjames
Responding to the late-season concern as best I can, I took a copy of the data and eliminated from it everything except September and October starts. Since there were 179,000+ data lines in the full-season study, this left us with 31,105 data lines in the September/October version of the study, or 3,110/3,111 data lines in each 10% "file" such as A1, A2, B1, J30, etc. This wasn't difficult to do; all I had to do was eliminate the pre-September starts and re-calculate the results; just took me 30 to 45 minutes.

You may think that 3,000 data lines is a good number, but you certainly do get some instability in the data, with 3,000 lines in each file, that you would not get with 18,000 in each file. There are also two other changes noted: 1) that all of the ERAs are lower and the average game scores higher, since September tends to be a pitcher's month, and 2) that there is more decentralization in the data--that is, the top and bottom move away from further away from one another--due to the "September decentralization." This time of year, a few teams just more or less quit.

Aside from those effects, the data patterns in the September study are the same as noted in the larger study. The data flattens out after about 15 starts; however, the pitchers who have been good over the last 30 starts are substantially better in their next start than those who have been good over their last 4 or 5 starts, and 30 starts is probably (those less clearly) a better "read" on the pitcher's current effectiveness than is 15 starts.

Of course pitcher's levels of effectiveness sometimes change, particularly young pitchers. Randy Johnson in 1995 is not the same as Randy Johnson in 1992; Roy Halladay in 2001 was not at all the same as Roy Halladay in 2000, and Zack Greinke in 2005 was 5-17 with a 5.80 ERA. It is a question of what you bet on. You don't bet on the last 4-5 starts; you bet on the last 15 or the last 30.
10:43 PM Sep 1st

flyingfish
Sorry, you actually said more information is USUALLY better than less. I need to read more carefully.
6:52 PM Sep 1st

flyingfish
Thanks for answering my question, Bill. Very interesting indeed. As you said in the beginning of your article, more information always is better than less, and so I think when we get to the way MarisFan61 rephrased my question to focus on the postseason, or indeed the question as I originally phrased it, we do have additional information. For example, Joe Kelly of the Red Sox has not been very good overall this season but his last several (5?) starts have been very good. Will he revert to the old Joe Kelly? Well, we know that he says that in his recent trip to AAA, he LEARNED TO PITCH DIFFERENTLY. Other players, both teammates and opponents, have said he looks like a better pitcher recently. I'm not quite sure just how much to make of that information, but it certainly helps to some degree; you wouldn't be using only his statistical record because you know some other, relevant, things. Similarly with a pitcher who is pitching through a seemingly minor injury and then ends up on the disabled list (Steven Strasburg comes to mind, more than once). If HE starts pitching badly, then I'm going to need to find out if he's hurt. Thanks again.
6:08 PM Sep 1st

DanaKing
As always, lots of interesting stuff here. Thanks for taking the time on this, Bill.

To me, what this shows more than anything is that, while recent stats can clearly show if a player is hot or cold, it's almost impossible to show whether said player will remain hot or cold in his next game. There are two many variables, not the least of which is human frailty. (Some days or weeks people just feel better or worse physically, and it affects their performance.)
12:52 PM Sep 1st

stevekohlhagen
great analysis, bill.

however, as we all know, no matter how much you show objective evidence that forecasting future performance off of someone's subjective notion of who's hot and who's not doesn't work, some people will resort to emotional claims that THEY can. despite all evidence to the contrary.

human nature i guess.

great analysis. thanks. swk
12:30 PM Sep 1st

78sman
Bill, this is great information.

MarisFan, your point is good too. Data do not always tell us why a pitcher performed well or poorly in recent starts, so Bill's analysis did not account for reasons why a pitcher had done well or poorly in recent starts. If Bill had been able to separate out pitchers who had an important nagging injury or a dead arm, then one would expect different results for that sub-group.

I love Bill's analysis, and I think that MarisFan has added an important caveat.
11:18 AM Sep 1st

MarisFan61
P.S. Interestingly, while in general it's no problem for such a study to ignore pitchers who haven't been around for a full season and to eliminate the first 30 starts of a pitcher's career from consideration, if we're looking most particularly at the coming post-season -- which I do think was the main thrust of the question -- there is indeed at least one such pitcher who may be a prominent part in such a choice: Luis Severino of the Yankees.
11:14 AM Sep 1st

MarisFan61
Yes, but..... :-)
I think this way of doing the study might be sort drowning the specific thing we're looking for. I don't know that it makes a difference but it's not hard to imagine that it might.

It depends on what we're really looking for, and that depends on how we see the original question:

If you were choosing a pitcher to start a critical game, say a one-game playoff or game 7 of a series, would you be more likely to choose the pitcher on your team who has the highest pitcher score or the pitcher who has pitched best over the past four or five starts? Or would you use other criteria or a combination of criteria?

Bill, it seems you took the question broadly: emphasizing the first phrase in the question, and taking the next part just as a fr'instance. I'd guess his emphasis was the other way around: that he was really talking mainly about the post-season and that the first phrase was just a lead-in, since it's a question that 'in the air' a lot right now, because of the time of year, and what people almost always mean is the post-season.

What you looked at does answer the broad question. But if the emphasis was the other thing, I think it might well make a difference that we're talking about the end part of the season.

As I said, I don't know that it makes a difference but I can imagine it might: Pitchers are more likely to have tired arms toward the end of the season, and some pitchers do and some don't; obviously the ones who don't have tired arms have an advantage; and it may well lead to different correlations than looking at performance at all times of the season.

If one agrees with this and wanted to do a similar study looking at the question in this narrower way, I think there would be traps. It might seem like the most direct way would be to look at post-season performances by starting pitchers and seeing how they correlate to prior groups of their games. But, that would be fraught, because the way pitchers are and aren't used in the post-season takes into account how their arms are. Pitchers who have tired arms or sore arms would tend either not to have been used in that post-season or to have been given extra rest. Those things would tilt the data away from answering the specific thing that I think we're looking at.

I guess the point in what I'm saying is that the answer depends on why a pitcher with a good longer-term record has done worse lately, and that when such a thing happens late in the season, it may be more likely than usual (for any pitcher, good or bad long-term record) that it's related to his arm being compromised. For pitchers whose arms aren't particularly compromised, I'd say the answer is covered very well by this study. Otherwise, it's a different story. And actually in a way this is just a "d'oh" thing, because probably anybody would say "Of course you don't start a pitcher with a bad arm."

OK, now I better run and hide before being trampled.... :-)
10:59 AM Sep 1st

mrkwst22
....and that ladies and gentlemen is why Bill James is a National Treasure.
8:33 AM Sep 1st

Recent Game vs. Full Season Performance for Pitchers

COMMENTS (16 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: