Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

October Power

By Bill James

October 6, 2008

I had a question in the “Hey, Bill” section on September 25, from someone signing in as Jake, asking about the “notion that ‘power pitcher’ are better in the postseason (this is often used to explain why Smoltz was better than Glavine or Maddux in the playoffs). Aside from the obvious that good pitchers tend to throw the ball harder than bad ones, this one doesn’t seem to hold much water, but I’ve never seen any data on it.”

I told him that I would try to study the issue. I had in mind a quick 5- or 7-hour study of the issue, but it turned into a time-consuming monster that probably took me 35 hours to do. . .like that effects your life or the value of the study. Anyway, I was not the first person to study this. Someone sent me a link to a study by Nate Silver, asking the more general question “What are the characteristics of teams which succeed in the post season?”, in which Silver found that teams with power pitchers did well.

We should say at the top that obviously this bias in the data could not explain the post-season effectiveness of a pitcher like Smoltz or Schilling. There may be a bias in post-season in favor of power pitchers, but obviously it could not be large enough to cause John Smoltz to go 15-4 with a 2.65 ERA in 207 post-season innings.

Anyway, I decided to study this with a matched set study, taking the quality of the pitchers out of the equation by very carefully matching power pitchers with equally good finesse pitchers. I started by identifying all pitchers in the years 1969 through 2001 who made at least one post-season start (starting in 1969 because that is when the playoffs start, giving us a more useful number of pitchers appearing in post season, and ending in 2001 because when I got through 2001 I knew that I had enough pitchers to make the study work.) There were 587 pitchers who made a post-season start in those years. . .not 587 different pitchers, but 587 if you count Bob Welch, 1981, Bob Welch, 1983, Bob Welch, 1985, Bob Welch, 1988, Bob Welch, 1989, Bob Welch, 1990, and Bob Welch, 1992, as seven pitchers.

I assigned each of these pitchers a “power score”, by this formula

2 Times Strikeouts

+ Walks

+ 2 Times (Strikeouts above the league average)

Per Nine innings.

Randy Johnson in 2001 had 372 strikeouts, 71 walks in 249.2 innings. An average pitcher in that league would have struck out 194 batters in 249.2 innings, so the Unit was +178 strikeouts. That makes his “Power Score”:

((2*372) + 71 + (2*178)) * 9 / 249.667

This makes a Power Score of 42.22, which was the highest of any pitcher in the study. The lowest Power Score of any pitcher in the study was 1.36, by Mike Flanagan in 1982.

I then started forming matched sets of pitchers with very different Power Scores, but nearly identical records in other respects. I matched pitchers on eleven criteria

Year (So that, in general, a pitcher from the 1970s would be more likely to be paired with another pitcher from the 1970s)

Age

Games Started

Wins

Losses

Innings Pitched

Runs Saved compared to league average

Career Innings Pitched

Career Wins

Career Losses

Power Score

The system is set up so that if two pitchers were identical in all of these areas their Similarity Score would be 1000.000. For every difference between them points are subtracted, except that for a difference in the Power Score points are ADDED, rather than subtracted, so that the highest-scoring candidate to be compared to each pitcher is a pitcher with very similar wins, losses, innings pitched, ERA, etc., but a very different Power Score.

The best match in the study was Scott McGregor, 1979, representing the “finesse” camp, and Kerry Wood, 1998, representing the “power” department. Both pitchers were 13-6. McGregor gave up 65 earned runs in 174.2 innings, giving him a 3.35 ERA, while Wood gave up 63 earned runs in 166.2 innings, giving him a 3.40 ERA. The league ERAs were almost the same (4.23 and 4.24), so McGregor was 17 runs better than an average pitcher, Wood 16 runs. Kerry Wood was 21 years old at that time; McGregor was 25. The largest difference between them, other than power, was that McGregor had pitched 536 career innings with a 31-25 career record, whereas Wood was a rookie.

But whereas McGregor had struck out 81 men and walked 23, Wood had struck out 223 and walked 85. Huge difference.

The second-best match was Randy Johnson, 2001, against Tom Glavine, 1998. Johnson was 21-6, 2.49 ERA; Glavine was 20-6, 2.47 ERA. Glavine was 32 years old with a career record of 173-105; Johnson was 37 years old, with a career record of 200-101.

This was not the only time that Glavine and Johnson were paired in the study. Randy Johnson in 1998 was also paired with Glavine in 2000. Johnson in 1997 (20-4, 2.28 ERA) was paired with Greg Maddux the same year (19-4, 2.20). Randy Johnson in 1999 (17-9, 2.48 ERA) was paired with Maddux in 1998 (18-9, 2.22). Maddux in various other years was paired with Pedro Martinez, Curt Schilling and David Cone. Maddux in 1989 (19-12, 2.95 ERA) is paired with Tim Leary in 1988 (17-11, 2.91). It looks silly in retrospect, but Maddux at the time had a career ERA of 3.77, Leary 3.78. Maddux in 1999 (19-9. 3.57 ERA) is paired with David Cone in 1995 (18-8, also a 3.57 ERA). David Cone in 1998 (20-7, 3.55 ERA) is paired with Jamie Moyer in 2001 (20-6, 3.43), and Cone in 1988 (20-3, 2.22 ERA) is paired with Orel Hershiser in 1985 (19-3, 2.03). At various other times Cone is paired with Ed Figueroa, Tom Glavine and Rick Reuschel.

Back-of-the-rotation guys have their matches as well. Scott Sanders, 1996 (9-5, 3.38 ERA) is paired with his actual teammate the same season, Alan Ashby (9-5, 3.23). Floyd Bannister, 16-10, 3.35 ERA in 1983, is paired with Zane Smith in 1991 (16-10, 3.20). Both pitchers had career records, at that time, of 67-78. Nolan Ryan is paired with Tommy John, Mike Scott with John Tudor, Juan Guzman with Larry Gura. Tim Lollar in 1984 (11-13, 3.91 ERA) is paired with Rick Camp in 1982 (11-13, 3.77). Dave Righetti (8-4 with a 2.05 ERA in 15 starts in 1981) is paired with Tim Wakefield (8-1 with a 2.15 ERA in 13 starts in 1995). Lance Painter in 1995 (3-0 with a 4.37 ERA) is paired with Bob Wolcott the same season (3-2, 4.42). David Segui in 1971 (10-8 in 21 starts, 3.14 ERA) is paired with Ray Burris in 1981 (9-7 in 21 starts, 3.05 ERA); their career records at the time were 74-83 and 72-83. Jack Morris, 1991, is paired with Tommy John, 1980; Morris was 18-12 to John’s 22-9, but both pitchers had 3.43 ERAs, and their career records were 214-141 and 216-162. Not everybody has a match, of course, but the pitchers who don’t match anybody are left out of the study.

The lowest-scoring “match” that qualified for the study was Mike Cuellar, 1971, with Don Sutton, 1974. Cuellar, 34 years old, had made 38 starts, pitched 292 innings with a record of 20-9. Sutton, 29 years old, had made 40 starts, 276 innings with a record of 19-9. Cuellar’s ERA was 3.08 against a league norm of 3.47, so he had saved 13 runs vs. an average pitcher. Sutton’s ERA was 3.23 against a league norm of 3.63, so he had saved 12 runs vs. an average pitcher. Cuellar had a career record at that time of 109-69; Sutton had a career record of 139-113. Even their walks were similar—78 for Cuellar, 80 for Sutton—but Cuellar, with 124 strikeouts, was 52 strikeouts below the American League norm in 1971, whereas Sutton, with 179 strikeouts, was 22 strikeouts above the National League norm in 1974.

In individual cases there were differences between the pitchers. In the aggregate, there were almost no indications of a difference in quality. The finesse pitchers were a little bit older, averaging 29.1 vs. 28.4 for the power pitchers. There were 100 pitchers in each group. The power pitchers had an aggregate won-lost record of 1519-831; the finesse pitchers with 1514-838. The power pitchers had pitched an average of 210.0 innings; the finesse pitchers, 211.1. Both groups had given up an average of 85.5 runs, with 77.1 or 76.8 of those earned, leading to ERAs of 3.33 (for the power group) and 3.29 (for the finesse group.) Both groups had relative ERAs (ERA divided by league ERA) of .813. The power pitchers had made an average of 199 career starts, with career records averaging 79-58 and career ERAs of 3.50. The finesse pitchers had made an average of 223 career starts, with records averaging 84-62 and career ERAs of 3.54.

Our essential goal was that there would be no observable difference in the quality of the pitchers in the two groups. But the power pitchers had averaged 183 strikeouts, 76 walks; the finesse pitchers had averaged 107 strikeouts, 57 walks. The two groups were nearly even in terms of home runs allowed (a few more for the power pitchers), but the finesse pitchers had given up, on average, 18 more hits. 18 more hits, 19 less walks, one less homer. . .the same results overall.

Having formed these two nearly-identical groups of 100 pitchers (all of whom had made at least one start in post-season play), I then looked up their records in post-season play. To cut to the chase without ceremony, the Power Pitchers did in fact perform better in post-season play than did the most-equal Finesse Pitchers. The Power Pitchers made 222 starts in post-season play, with a won-lost record of 85-67 and an ERA of 3.35. The Finesse Pitchers made 214 starts in post-season play, with a won-lost record of 73-87 and an ERA of 3.59.

This difference is NOT statistically significant. The winning percentage of the 200 pitchers, in post-season play, was .50641. The chance that a group of pitchers with that winning percentage would go 85-67 or better is 6%, and the chance that a group of pitchers with that winning percentage would go 73-87 or worse is 9%. The difference in ERA is more difficult to test for significance, but it appears to be obviously smaller than the difference in Won-Lost records, and thus unlikely to meet a higher standard of statistical significance.

In post-season play the Power Pitchers in our study faced 6,043 batters, pitching 1424.2 innings—essentially one team/season’s worth of pitching. The Finesse pitchers faced 5,684 batters, pitching 1343 innings. The power pitchers limited opponents to a post-season batting average of .227, whereas the finesse pitchers allowed a post-season batting average of .263—larger than the batting average difference between the groups in regular season. This difference was only partially offset by walks. The power pitchers walked 3.35 per nine innings in post-season play; the finesse pitchers, 2.69.

My conclusion is that it does appear to be probably true that Power Pitchers are more effective in post-season play, with an advantage in the neighborhood of 0.25 ERA, for pitchers of the same quality. However, the difference is small enough that these could be random effects.

One other note from the study: Six of the power pitchers won the Cy Young Award, while only two of the matched finesse pitchers were given the award.

COMMENTS (23 Comments, most recent shown first)

mgl
"It is an unstated assumption of research that others will be able to see flaws in any study, and to improve upon the research when they take up the issue. It is exactly this that makes knowledge different from BS."

That is true and an excellent point. When I publish or present research, I would hope that someone would care enough about the material to point out the flaws and mistakes and/or suggest improvements. Whatever their motivation. That doesn't matter to me.

In this case, my only agenda is to make sure that the information is as true as it can be.

Whatever Bill intended or did not intend is not particularly relevant. The only relevant and important thing is that people who read this article and those who "hear of it" are aware of the biases in the two samples and why they ocurred, and thus, to be careful in drawing any conclusions.

Once you/we are aware of exactly what is going on, you can couch the conclusions any way you want and it won't change things.

As we have pointed out enough times already, the two groups are likely not equal in "true talent" for the reasons stated. It is also likely that when you "match them" for everything BUT BABIP (which is essentially a proxy for K rate once you control for everything else), you have created one group that was luckier than the other in their performance during the sample time period.

Therefore, when you look at any other time period, be it the post season or next season or the season before, both groups will revert towards their mean. One mean will be better than the other (the power pitchers), so therefore the power pitchers will likely post a better ERA (or RA, ERC, DIPS, FIP, etc.) in that out-of-sample timer period.

That is exactly what is going on. As long as the readers understand that they can draw their own "conclusions" or mince words as they like.

If they want to say that that means that "If finesse pitchers and power pitchers perform the same during the regular season, according to ERA, W/L, HR rate, etc., the power pitchers will perform better in the post-season, that is fine by me, and is actually true. It is still important to understand WHY, however. And I hav explained why.

If someone wants to call the 2 groups "equal in quality" as Bill did in his last paragraph, that is also fine by me, again, if they understand that the only thing equal about them was their performance as measured and matched by Bill - that their true quality is likely NOT equal.

Again, it does not matter how people want to couch things as long as they understand exactly what is going on.

I am being somewhat of an apologist here for Bill (and others) - as I do believe that it matters how things are "couched" (because the way a piece of research is presented influences how readers understand the dynamics of the research and in effect the way the world works with respect to that research). But that is somewhat of another story.

8:07 PM Oct 9th

bjames
It is an unstated assumption of research that others will be able to see flaws in any study, and to improve upon the research when they take up the issue. It is exactly this that makes knowledge different from BS.
7:38 AM Oct 9th

mgl
"just because Joe Morgan says it's true doesn't necessarily make it false."

Now that's the funniest, most worthwhile comment herein!
9:43 PM Oct 8th

jakeparsons
Thanks so much for studying this. I didn't think the evidence would lead to anything different than the regular season numbers would indicate. I'll keep in mind the possibility of random effects, but I suppose I've learned a valuable lesson that just because Joe Morgan says it's true doesn't necessarily make it false. Thanks again.
5:18 PM Oct 8th

evanecurb
Interesting study and well done. I am astonished that the AL ERA in 1979 and the NL ERA in 1998 are essentially the same. 1998 was the year of McGuire, Sosa, and Greg Vaughn, pre-humidor Rockies, and I thought it was a very high scoring league. I guess I was wrong.
4:57 PM Oct 8th

Richie
Well, as long as I'm 100% right. And I win in a landslide for using fewer words. Unless I lose in a landslide for using fewer words.

Clearly we're each wrong. You're saying "Bill clearly intends 'A'!", and I'm saying "Bill clearly intends 'B'!", while no one other than Bill has any business saying what Bill clearly intends. Your position is linguistic (here he says "equally good", here he says "quality"), mine is content-based (no stathead uses pitcher wins and losses if he's trying to do what you guys claim he's trying to do).

Either position gets swamped once/if/when Bill says "this is what I'm doing here". And the study is what it is and is usable for such-and-such purposes irrespective of his intentions or our views of them.

Which uses I don't know that we disagree on at all. It has sabermetric shortcomings if used to research pitcher quality (tho' I would question the simple use of wins and losses more than leaving out BABIP), and is usable for predicting one aspect of post-season play for those of us who read espn.com, but not the SABR web site.
11:32 AM Oct 8th

mgl
Richie, you are 100% right, but again, that is NOT what Bill is asking and thinks that he is answering.

To wit:

"I had a question in the “Hey, Bill” section on September 25, from someone signing in as Jake, asking about the “notion that ‘power pitcher’ are better in the postseason (this is often used to explain why Smoltz was better than Glavine or Maddux in the playoffs)."

He did not show that power pitchers were "better in the post-season." He showed that given the parameters he used to create the matched pairs, that power pitchers performed better in the post-season.

Clearly, if I ask the question, "Hey are power pitchers or finesse pitchers better in the post-season" I am implying, "Given the same true talent in the regular season or in their careers or as suggested by a good projection!"

I don't get to set up a study whereby I accidentally create one group who performed better because they were lucky and the other group was unlucky (which was EXACTLY what Bill did) and say, "Hey, I answered your question. Look, these groups performed exactly the same in the reg season (with the clear implication that they are of the same talent - which they are NOT), but one group performed better in the post-season."

How can I do that? That is not a very fair way to answer the question! Heck I could have easily set up the study to create two groups of pitchers with the finesse group having been unlucky and the power group having been lucky and guess who would have done better in the post-season? The only way my study is a fair answer to the question is if I set up an unbiased one, which Bill did not, by accident.

If we get to set up studies any way we want to create bias, which creates any answer we want, how can that be the right way to answer the question? Would you like me to set up a similar study whereby the finesse pitchers come out better in the post-season? I can. How could both conclusions be correct?

Finally, if you read the last part of the article, there is NO ambiguity about what he meant:

"My conclusion is that it does appear to be probably true that Power Pitchers are more effective in post-season play, with an advantage in the neighborhood of 0.25 ERA, for pitchers of the same quality."

PITCHERS OF THE SAME QUALITY! It could not be any clearer. He did not say or mean "pitchers who performed equally in the regular season." Unfortunately, his two groups were NOT pitchers of the same quality! I don't know how to make that any clearer to you, Richie or anyone else. Just because two or more pitchers have the same ERA or any other stat during any time period (say an entire regular season) does not make them of the same quality. Bill wanted pitchers of the same quality, exactly as he says in his italicized conclusion. He did not create two groups of the same quality because of the bias in the parameters he controlled for.

If his stated goal were to see what happens with two groups of pitchers who "performed the same during the regular season" but were not necessarily of the same quality, then I am 100% in agreement with you. But that is not the case, according to Bill himself.
1:54 AM Oct 8th

Richie
My reading of it is that Bill was looking for association, not cause. He - and Nate Silver, first - found a clear association. Probably, statistically speaking.

The cause(s) may have nothing to do with the post-season, per se. But many folks are very interested in predicting the post-season. If the effects on October are the exact same as the effects on next April, well, whatever. If all I'm interested in are predictive indicators for October.
11:44 PM Oct 7th

mgl
Clearly Bill is trying to answer the question, "Do power or finesse pitchers have an advantage in the post-season," and the answer is, "Maybe they do, but you did NOT address that question in your study!

No, there is no controversy about pitchers and BABIP. Richie puts it quite well, when he says that it is "justthissideof" random. Every measure of a batter's and pitcher's performance, for any given sample size, has a certain element of luck (randomness) and a certain element of skill. We can figure out what the ratio is between luck and skill for any given sample size quite easily with some degree of certainty (for example, if we say that you regress K rate 50% after 100 TBF, it might be 45% or it might be 55%, but it ain't 80% and it ain't 20%).

BABIP for pitchers happens to be on the very low end of the "skill to noise ratio" scale, whereas BB and K rates are at the upper end. I don't know the exact number, but for a season's worth of data, it (BABIP) is no more than 10% skill. The fact that lefties and righties, GB and FB, knucklers and non-knucklers, finesse and power pitchers may have different mean BABIP as a group is a separate issue. In fact, there may be 10 different means for 10 different types of pitchers, but within those groups, the spread in true BABIP might be zero, or it might not be zero.

Anyway, we can levy the same criticism on this study without invoking BABIP at all:

We know that teams in the playoffs have pitchers AND hitters who performed better than average in the reg season. Therefore, by definition (anyone who does this kind of research or reads and understand it, knows this next point), the pitchers and the hitters in the post-season are better than average players AND got a little lucky in the reg season.

We also know for a fact that they will regress towards their true means in any out-of-sample period, including the post-season (or next regular season or the season before, or whenever).

Now, we know that power pitchers have better true talent than finesse pitchers. So they will regress less than the finesse pitchers!

IOW, let’s say that the mean ERA for power pitchers is 4.00 and for finesse pitchers, it is 5.00 (I am exaggerating a little of course), in a league where the average ERA is 4.50.

Now, let’s say that all pitchers who get into the post-season are around 4.00 (better than average). All of them will regress in the post-season simply because of regression to the mean (after adjusting for the quality of the hitters they face in the post-season).

But, the power pitchers will regress towards 4.00 and the finesse towards 5.00, so that in the post-season or any other out-of-sample time period, the finesse pitchers that were 4.00 in the reg season will be around 4.25 in the post-season (if they regress 50%) and the power pitchers will still be 4.00 in the post-season (4.00 in reg season and regressing towards 4.00).

So whether you focus on BABIP or not, the power pitchers will regress towards a lower mean than the finesse pitchers. That is the problem, by the way, with using a “matched pair” method when the two groups have unequal means (in true talent). In ANY out of sample timer period, the group that has the lower mean ERA (in this case, the power pitchers) will ALWAYS have a lower ERA!

That is true even if you looked at average pitchers in the regular season, but you still used matched pairs. For example, let’s say that in the reg season, all of these post-season pitchers performed at league average, or 4.50. Well, the finesse pitchers would regress towards 5.00 and the power pitchers would regress towards 4.00 so in any out-of-sample period the power pitchers will do better withing those matched pairs.

You CANNOT do matched pairs studies like he did when the two groups have different mean true ERA’s and then look at what happens to their ERA’s from one time period (in which you matched the players) to another timer period.

So technically Bill is right that the power pitchers performed better, but it HAS nothing to do with the post-season or the fact they both groups are facing above average batters, which was the whole point of his study, was it not? To see whether one group or the other has an “advantage” in the post-season?

While the answer to that question is probably “no” it could be “yes” but you cannot tell from his study. Not at all.
7:31 PM Oct 7th

tangotiger
Richie,

I fully expect this particular research piece to make it in the next Gold Mine. And to that end, I would hope that Bill is very appreciative of the peer review he receives on this site, and improves the study.

The entire point of the comments section is to let people put in their two cents. Telling me that I should lower my expectations for whatever invented reasons you wish to offer is offensive to both the writer and the reader.
7:23 PM Oct 7th

bsol007
Interesting idea. Plainly, it is difficult to apply this to earlier times in baseball, since, back then, the only "postseason" games were the World Series.
I was wondering if some way could be found to study whether today's fastest ballplayers differ from their counterparts of, say, 80 years ago. If today's fastest runners are appreciably quicker, then likely they would be stealing more bases?
7:08 PM Oct 7th

Richie
Meeting got canceled. Back to not-working.

Is BABIP actually a hot issue? I thought it was rather established that it was justthisside of random.

Given that he stuck wins and losses in there - twice, as a matter of fact (career, too) - I took this to mean Bill intended it as a study of popular rather than stathead metrics. The type of stat every reader of this site and his baseball fan little sister recognizes and has easy access to. A study Peter Gammons could easily grab onto.

I just don't see 'bias' or 'flaws' unless you ascribe to Bill a task beyond what he's aspiring to here, from what I can see. My understanding was that this is not a sabermetric site, per se, but one on which we'll get some such research. I welcome Bill's quick-and-easy studies, likewise the one Matt Namee gave us on laser surgery. Holding them to certain standards of comprehensiveness will simply leave them insufficient time to produce such, given their day jobs. And possibly drive away a fraction of subscribers such that the site loses its economic viability.
5:03 PM Oct 7th

tangotiger
He is specifically answering this question: "Do power pitchers perform better or worse compared to finesse powers who have otherwise performed equally". But, the reason that they performed otherwise equally is because the finesse pitchers managed to post a better BABIP. The compensated with the lower K-BB differential by posting a better BABIP. And, in the out-of-sample results, our expectation is that the ERA will increase because the BABIP was not controlled for. The determinant is not the power/finesse categorization. Indeed, that the finesse pitchers posted a smaller differential already tells me that they are not otherwise equal. Groups of pitchers that have a smaller K-BB differential are expected to post ERA in the out-of-sample data worse than guys with higher K-BB differential. If the question is nuanced, then the results must be nuanced. And if it's that nuanced, who is going to understand it?
4:12 PM Oct 7th

Richie
If Bill were studying 'what type of pitchers do well', it would be 'biased'. He was instead anwering a discrete question: Do power pitchers indeed do better than finesse ones? A touch more on this semantic question later. Right now I have to get back to work!
4:00 PM Oct 7th

tangotiger
BABIP is *not* random. But, Bill needs to control for that. What we see here is that there are TWO uncontrolled for parameters: (1) one is the enormous K and BB difference (representing power/finesse), and (2) the other is the difference in BABIP. And what do we find in the out-of-sample results? That there is a difference of 0.25 runs. (And our supposition is that if Bill were to show the results of the following regular season, we'd also find a 0.25 difference.) However, the reason is NOT because of the power/finesse variable. Because there are two uncontrolled for variables, we cannot tell why there is a difference. And by NOT acknowledging the BABIP parameter as existing, the results are *biased*.
3:30 PM Oct 7th

Trailbzr
Richie, I think you and I unwittingly walked into the cross-fire of one the hot issues in current statistical-sabermetrics -- whether BABIP is random, or if there are really pitchers that those of my generation associate with Tommy John or Dan Quisenberry who are good at inducing outs-in-play.
Tango and MGL appear to belong to the school of thought that BABIP is random (given any one defense), so if you match pitchers on ERA and HR, and then split them on strikeouts, you've implicitly created a high- and low-BABIP group; which in their view means a "lucky" and "unlucky" BABIP group.
I think BillJ said in his most recent Historical that BABIP is less-repeatable than other pitching skills, but not that there's no such thing.
12:00 PM Oct 7th

Richie
Nothing flawed about the study that I see. Nor biased. Bill found/confirmed that power pitchers do better in the post-season than similarly-performing finesse pitchers. You guys found an/perhaps the underlying reason for that.
11:21 AM Oct 7th

mgl
Richie, that is exactly correct! If a power pitcher and a finesse pitcher were truly of equal talent, then yes, I would expect them to perform exactly the same in the post-season (given a large enough sample of course).

Now, I don't know that for sure, but we don't know it either from Bill's study because his study was severely flawed for the reasons that Tango and I explained (that both groups were NOT matched in true talent). Yes, they were matched by performance, but the way he matched them created a bias such that one group was "luckier" than the other in the initial sample and thus in ANY OTHER sample (post-season or otherwise), you would expect that the lucky group would do worse than the unlucky group (since luck disappears in an unbiased sample, which is the second sample - in this case, the post-season), and it has nothing to do with the fact that the second sample is comprised of post-season play.
3:02 AM Oct 7th

Richie
I don't see where Bill matched them according to talent. He matched them according to results.

Critiquers, not that you're obligated to dumb it down for the rest of us. But I don't get what your point is regarding BABIP. I think you're saying that the finessers only 'regular-season matched' the power guys by better luck on BABIP, which then understandably washed out come playoff time. Ergo, a finesser and a powerer with equal luck could reasonably be expected to pitch equally well come post-season. But darned if I'm at all sure as to that's what you're saying.
6:43 PM Oct 6th

tangotiger
I agree with MGL that if you looked at the regular season record of those 200 pitchers in the *following* regular season (as opposed to the post-season), that you will likely find a similar ERA gap. This simply points to the BABIP bias.

2:50 PM Oct 6th

mgl
Bill, Tango is right of course. You would expect that the power pitchers would outdo the finesse pitchers in the post-season, or any time period outside of the initial sample time period, for the reasons that Tango mentioned. Your flaw is that the groups are NOT matched according to "true talent" as you say in your article.

If a group of power pitchers has the same ERA (and w/l, etc.) as a group of finesse pitchers, the finesse pitchers will have a lower BABIP than the power pitchers, especially if you control for HR rate, which I think you did.

This means that the power pitchers in the group have been slightly unlucky and/or the finesse pitchers have been slightly lucky. When we look at any other time period, including post-season play, we will see that both groups' BABIP will regress towards league average (almost 100%) and that the power pitchers' ERA will "go down" and the finesse pitchers' ERA will "go up" (as you found) because of regression toward the mean, not because one is more suited to post-season play than the other.

If you doubt that, do the same exact study, but look at any other time period for both groups outside of the sample time period. For example, the next season. Or leave out September in the initial sample and then look at September. You will find the same results. It has nothing to do with the "post-season."

Trailbzr, you are missing the point of Tango's critique. The power pitchers' ERA during the reg season was an "unlucky ERA." Their BABIP was higher than league average. Their true talent is better than the 3.33. Their ERA in the reg season "should have been" in the neighborhood of 3.20 or so. So it actually rose around .15 runs in the post-season, as it should against better hitters.

Why such a small rise when they are probably facing hitters who are .3 or .4 runs better than average? Because the weather is much colder and because the relievers who follow them (and influence their ERA) are much better.
2:07 PM Oct 6th

Trailbzr
Agree... good study and good to post it despite the tepidity of the conclusions.
But what I find most noteworthy is that power pitchers' ERA was 3.33 in regular season and 3.35 in postseason, when the quality of opponent is obviously better. If its true that power pitchers' ability varies less with the quality of opponent than a finesse pitcher's does, that alone could have some important implications for personel use.
1:21 PM Oct 6th

tangotiger
Very enjoyable study.

If you look at the BABIP (batting average on balls in play, or H minus HR divided by PA minus BB, K, HBP, HR), I think you will find that the finesse pitchers ended up with a BABIP of 10 or 12 points better. Or, probably a bit more lucky than the power pitchers that year. So, I think the study is biased in that while the component ERA may come out as equal for the two groups, the component ERA of the power pitchers is more indicative of the true talent.

I estimate that the 10-12 estimated difference in BABIP to be roughly worth 0.20-0.30 in ERA, thereby giving you a perfect match for the post-season difference.
12:54 PM Oct 6th

October Power

COMMENTS (23 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: