Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Evans and Parker, Defense 1986

By Bill James

February 23, 2012

A couple of weeks ago I wrote an article for Grantland about Dwight Evans, arguing that Evans was a Hall of Fame caliber player. The article contrasted Evans’ contribution to his team with that of Dave Parker (among other players), and one section of the article contrasted the defensive contributions of the two players in the 1986 season. In that section I wrote this:

Because of the difference between them in range, however, Baseball Reference estimates that Parker in 1986 was 17 runs worse in the outfield than an average right fielder, whereas Evans was 8 runs better. That’s 25 runs.

I don’t know how they calculate that, and, because defense is so hard to measure, I prefer to use more conservative measurements. The difference between an average team and a championship team, in a season, is only about 150 runs. Saying that the fielding difference between two right fielders is 25 runs is a little like saying that a 150-pound woman gave birth to a 25-pound baby. Ouch. I’m not saying it’s not possible; it’s just hard to believe. I have Evans as being only about eight runs better than Parker in the field, not because I don’t believe the 25-run difference is possible, but just because I just don’t think that we know for certain how large the difference was. Parker also had been an outstanding defensive outfielder earlier in his career. But I don’t think anyone questions that, by 1986, Dwight Evans was a lot better outfielder than was Parker.

Some of you probably realized, in reading that, that I had used a deceptive comparison to illustrate my point. I compared a very good right fielder—Evans—to a very bad right fielder—Parker in 1986—but, to illustrate my point about the scale, I compared a very good team to an average team; not a bad team, but an average team. The difference between a very good team and an average team is about 150 runs, but the difference between a very good team and a very bad team is 300 runs. I was making the legitimate point that "that’s a lot of runs", but I did it in a little bit of a deceptive way.

You can flog me later, but first I wanted to look more carefully at the issue this raises: is it reasonably believable that the defensive difference between two right fielders would be 25 runs? Or, given the scale of differences between baseball teams, is that just too many runs?

How do we figure that?

I constructed a model to represent the problem. What do we know, as a basis of the model? We know that the standard deviation of runs scored in baseball (per team) is about 80 runs now, less in the 1980s. We know that there are nine positions that feed into that difference. We know that the defensive contributions of the nine positions vary widely, and we know that the defensive responsibilities of the pitcher are vastly larger than their offensive responsibilities, but that the defensive responsibilities of the other fielders are smaller than their offensive responsibilities. We know that the relative offensive/defensive responsibilities are different at different positions; shortstops have more defensive responsibilities relative to their offense than do first basemen.

I constructed the model in this way. . .this actually was easier than it probably sounds. From concept to completion of this little study was something less than an hour; it wasn’t complicated, and it was very convincing, to me. First, I created two lines of what I call "bell random" numbers. A random number, if you graph it, is a flat line; there are as many random numbers between .800 and 1.000 as there are between .400 and .600. If you take two random numbers, add them together and divide by two, you have random numbers but they make a bell-shaped curve, and you have four and a half times as many random numbers between .400 and .600 as between .800 and 1.000. For reasons that I would hope would be obvious to most of you, "bell random" numbers suit our purpose here better than true random numbers.

Then, to represent each player, I formed an "offensive value" and a "defensive value". The "offensive value" for each player was a base number times the bell random number. The "defensive value" was the base number, times another bell random number, times a number representing the defensive responsibility of the position, which we could call the positional defensive fraction.

The "base number" was 100 in the first trial, and settled at 117 because at 117 we had the proportions we needed to represent the real world. The "positional defensive fraction" was .300 for right fielders (.150 for first basemen, .800 for catchers.) For each right fielder in the model, then, his "offensive value" was 117 times a bell random number, and his "defensive value" was 117 times a bell random number, times .300.

I created a large number of "model teams"—more teams in the model than in the real history of major league baseball. I then figured the standard deviation of team runs scored—a number that we know to be just short of 80 in real life. In the 1980s, not counting 1981, it was 71.8.

If the base number was 100, then the standard deviation of runs scored per team would only be 61.6, which is too small. If the base number was 120, then the standard deviation of runs scored per team would be 73.9, which is a little bit too large. The base number that works best, to make the standard deviation of runs scored what it ought to be in the 1980s, is 117.

The key question, then, is "what is the standard deviation of fielding runs by right fielders, in this model?" If the answer to that question was "3.0", then I was right in saying that 25 runs is too large to be a believable separation in defensive value between two right fielders, because that would be eight-plus standard deviations. But if the answer to that question was "10.0", then I would be clearly wrong; given realistic assumptions, it would be entirely possible that the defensive difference between two right fielders would be 25 runs, even though the sum total of all differences between the two teams would rarely be larger than 300 runs.

Answer?

I was dead wrong.

The standard deviation of runs saved by right fielders, in this model, was 7.30. Evans, at +8 runs, would be a little more than one standard deviation above the norm, on a team level (assuming one full-time right fielder for each team.) Parker, at -17 runs, would be a little more than two standard deviations below the norm. It’s not an unbelievable defensive separation between two players, at all. If it was 40 runs, it wouldn’t be hard to believe. If it was 50 runs, maybe that’s a little hard to believe, but at 25, we’re not anywhere near the margins.

Of course, it’s a crude model, and we can’t infer too much from it. But. . .I believe it. For the model to be wrong on this issue, it would have to be wrong by a substantial margin on one of its assumptions. Suppose we changed the "positional defensive fraction" for right fielders from .300 to .200. The standard deviation for runs allowed by right fielders is still 4.86—and at 4.86, it’s still believable that there would be a 25-run separation between two players. If the positional defensive fraction was .20, then only one-sixth of the responsibility of a right fielder would be playing defense. It’s hard to believe that the defensive responsibility of a right fielder is smaller than that.

Yes, it’s a crude model, but the simplicity of the model is one of its strongest points. Because the model is based on very few assumptions, there are a very limited number of points about which it could be seriously wrong. To avoid the conclusion that a 25-run separation between defensive right fielders is entirely possible, the model has to be seriously wrong on some point. I doubt that it is.

Look, I still think I was right in one sense. I was trying to make an argument, and I was trying to convince people that I was right about what I was saying—as I believe that I was, and I think probably most of you believe that I was. You can’t convince people of what you are saying if you make assumptions that you can’t support; therefore, to convince people, you need to use conservative assumptions. If I had assumed that Evans was 25 runs better than Parker (defensively) in 1986, some people might very reasonably have said, "Oh, that’s too many runs for that pocket; he’s got an elephant sitting in a pool pocket there, he doesn’t know what he’s talking about." To avoid that, I scaled the number of runs back to a very conservative estimate, eight runs. I wasn’t wrong to do that, and I wasn’t trying to mislead people; I was merely trying to use conservative estimates so as to give skeptics every opportunity to buy into my argument.

But just between you and me, just between friends—25 runs is a realistic estimate, too. That’s what I know today that I didn’t know yesterday.

COMMENTS (9 Comments, most recent shown first)

bjames
Responding to Trailbzr. . .That's right. There IS an answer there, but it's very, very difficult to get to it, so we probably won''t know for sure what the answer to that one is for several years yet. I doubt very seriously that it matters with respect to the narrow issue of the standard deviation of fielding runs.
11:29 AM Feb 29th

Trailbzr
There's an important point between the most recent exchange, but I'm not sure how it shakes out. "Is there a correlation among the defensive contributions among the fielders on one team?" And if so, is it positive or negative?

On the one hand, it seems the correlation could be positive -- sign a great fielding shortstop, and the current marginal one might move to second or third, where he's better suited. On the other hand, being able to play a defensive liability in LF might be empowered by having an outstanding CF.

There's a research project there; though it would be hell to check, since just about every good tool for evaluating individual defense takes the team's overall defense into account, and hence makes the correlation among individuals on one team difficult to isolate.

4:49 PM Feb 27th

bjames
Responding to Robinsong. .. .Statistical models are always SIMPLE imitations of COMPLEX realities. There are always a thousand features of real life that are missing from the model. Feel free to build a more complicated model if you can figure out how to do so.

Addressing the question of whether this is a meaningful discrepancy or not. ...I would suspect not. The model is set so that the standard deviation of wins is the same as in real life--without regard to the question of whether there are ACCIDENTAL accumulations of talent or CAUSAL accumulations of talent. Your point, then, only seems to be relevant if the assemblages of talent are Accidental for offense but Causal for fielding, or vice versa. This seems improbable.

It is obviously true that talent alignments are not random, but the non-randomness would have to operate at a fairly high level for your point to have import. If the non-randomness DID operate at that level then, for example, teams which were good on offense would also have a strong tendency to be good on defense (pitching/defense). The actual tendency of teams which are good on offense to be also good on pitching/defense is very, very weak, and would be zero if you removed from the data the bottom 5% of the teams, the kind of non-competitive teams which have basically given up. Thus, it seems unlikely to me that there IS any meaningful disparity caused by this problem. . .but as I said, feel free to build a better model and study it yourself.
9:29 AM Feb 27th

Robinsong
Bill -
While I love the approach, there seems to be a potentially fatal flaw in the calculation. To calibrate the individual variances, you use the team variance which is the composite of the individual ones. The problem arises if the individual draws are correlated, when I believe you are assuming that they are independent. If they are correlated, the derived individual variance should be lower than what you calculate. But correlation is certain: the Yankees and other rich teams will always be able to draw from the higher end of the distribution - they will tend to have good offense and good defense and good pitching and the individual players will tend to be better as well. If I am right, the true individual variances would be lower than you estimate and the difference between a good right fielder and bad one would be smaller.
2:40 PM Feb 26th

bjames
I'd be surprised if the conclusion in re Mantle and Mays was affected. 25 runs in ONE year is possible as the difference between a very good player and a very bad one. Sustained, year-in, year-out difference between a great center fielder and a pretty good one. .. .5 runs seems like a lot, and 10 seems impossible. I don't really think they're related discussions.
12:43 AM Feb 26th

Rich Dunstan
How much, if at all, would this study alter your conclusion in the original Historical Baseball Abstract, that Willie Mays' defensive superiority couldn't have outweighed Mickey Mantle's offensive superiority in their peak years? I know Mickey was pretty decent out there in center, so I don't suppose it would overturn the difference, but I'd think it might narrow the gap.
2:58 PM Feb 25th

bjames
Responding to Izzy. .. I wouldn't say that I had under-estimated defense, no. What I would say is that it takes multiple approaches to get COMFORTABLE with estimates of defensive value. This is another approach, and it makes me a little more comfortable with values in this range.
6:59 PM Feb 23rd

Trailbzr
Intriguing statistical set-up; combined with the offensive measures by position Bill did in the Monday blog a couple of years ago, this could be a really good set-up to quantify the defensive spectrum.

In this particular comparison, the 25-range is possible because Parker was worse than you'd normally expect the worst RF to be; -2.3 standard deviations occurs randomly only one time in 100. The best of 30 should average 1.8 sd's better than average or 13 runs. Sounds plausible.
6:31 PM Feb 23rd

izzy24
Thanks for another great article, Bill. I'll admit that much of the math went over my head, but your findings are very interesting. Does this mean that you have undervalued the importance of defense? Or have you just under estimated the difference between a great and terrible fielder?
6:16 PM Feb 23rd

Evans and Parker, Defense 1986

COMMENTS (9 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: