A. Stating The Question
To what extent does John Dewan’s +/- fielding rating match with other, more primitive methods to evaluate fielders? Let us suppose, for the sake of argument, that John’s system is 100% accurate in evaluating fielders. If John’s system is (or if it were) 100% accurate, then the accuracy of any other fielding metric could be found by finding the extent to which it agrees with the +/-. If the +/- is 100% accurate and another system agrees with the +/- 58% of the time, then how accurate is the other system? It’s 58% accurate, right?
In the modern world, the last five to ten years, we have very good ways of evaluating fielders. It is no longer within the range of substantive dispute whether a given player is a quality defensive player. There is a blizzard of data available, and there are many people looking at that data and reaching consonant conclusions about fielders. Essentially, we now know how good a fielder someone is.
However, most of baseball history lies outside the reach of this blizzard of data. We could extend the benefits of modern analysis to baseball history if we could develop methods that were consistent with the modern methods. We are, for the first time, in a position to answer very basic and important questions such as “How accurate is fielding percentage, as a measure of a player’s defensive ability?” As a first step toward that goal, I thought it would be helpful to ask this set of questions:
1) To what extent does the evaluation of fielders by the +/- system agree with range factors?
2) To what extent does the evaluation of fielders by the +/- system agree with the evaluation of the same fielders by fielding percentage?
3) To what extent does the evaluation of fielders by the +/- system agree with the evaluation of the same fielders by the ratio of double plays to errors?
B. Outlining the Method
This study deals only with third basemen, and only with the seasons 2005 through 2007. The study group for this report is “all major league third basemen playing 900 or more innings (cumulative) in the years 2005 through 2007”. There are 41 such players.
I began by putting together three-year fielding records for those 41 players,using the Fielding Bible Data given in the statistics section of this service. These records are given below. The categories of this record are the Player’s Name (Player), his full defensive innings (Inn) and thirds of an inning (I3), his Putouts (PO), Assists (A), Errors (E) and Double Plays (DP), his Fielding Percentage (FPct), his Range Factor (RF; range factor is putouts plus assists per nine innings), his Expected Plays Made on ground balls by the +/- system (Ex P), his Actual Plays Made on ground balls by the +/- system (Plays), the difference between these two (Diff), his +/- on balls hit in the air (Air), the total +/- on ground balls and balls hit in the air (+/-) and the Enhanced Fielding Rating (Enh, the Enhanced Fielding Rating being the +/- adjusted for extra base hits, so that cutting off a double is of more value than cutting off a single), and the Enchanced Fielding Rating per 1000 innings (Per In).
Player
|
Inn
|
I3
|
PO
|
A
|
E
|
DP
|
F Pct
|
RF
|
Ex P
|
Plays
|
Diff
|
Air
|
+/-
|
Enh
|
Per In
|
Alfonzo, Edgardo
|
913
|
2
|
81
|
177
|
8
|
11
|
.970
|
2.54
|
177
|
164
|
-13
|
0
|
-13
|
-11
|
-12.0
|
Atkins, Garrett
|
3862
|
0
|
260
|
800
|
50
|
93
|
.955
|
2.47
|
789
|
766
|
-23
|
-2
|
-25
|
-19
|
-4.9
|
Bautista, Jose
|
1390
|
2
|
123
|
319
|
22
|
23
|
.953
|
2.86
|
328
|
295
|
-33
|
0
|
-33
|
-34
|
-24.4
|
Bell, David
|
2512
|
2
|
188
|
599
|
40
|
49
|
.952
|
2.82
|
523
|
559
|
36
|
1
|
37
|
29
|
11.5
|
Beltre, Adrian
|
3963
|
0
|
397
|
881
|
47
|
81
|
.965
|
2.90
|
819
|
853
|
34
|
-2
|
32
|
44
|
11.1
|
Betemit, Wilson
|
1441
|
2
|
81
|
291
|
19
|
33
|
.951
|
2.32
|
273
|
274
|
1
|
-1
|
0
|
-3
|
-2.1
|
Blake, Casey
|
1249
|
0
|
103
|
267
|
15
|
25
|
.961
|
2.67
|
269
|
262
|
-7
|
0
|
-7
|
-4
|
-3.2
|
Blalock, Hank
|
2776
|
0
|
186
|
610
|
29
|
52
|
.965
|
2.58
|
606
|
585
|
-21
|
-1
|
-22
|
-27
|
-9.7
|
Boone, Aaron
|
2160
|
2
|
143
|
493
|
34
|
36
|
.949
|
2.65
|
471
|
455
|
-16
|
-2
|
-18
|
-16
|
-7.4
|
Braun, Ryan
|
945
|
1
|
61
|
161
|
26
|
12
|
.895
|
2.11
|
186
|
148
|
-38
|
-2
|
-40
|
-39
|
-41.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cabrera, Miguel
|
2882
|
2
|
236
|
578
|
42
|
71
|
.951
|
2.54
|
582
|
544
|
-38
|
1
|
-37
|
-38
|
-13.2
|
Castilla, Vinny
|
1755
|
1
|
197
|
324
|
16
|
31
|
.970
|
2.67
|
308
|
301
|
-7
|
-1
|
-8
|
-5
|
-2.8
|
Chavez, Eric
|
3288
|
2
|
292
|
751
|
26
|
86
|
.976
|
2.85
|
705
|
720
|
15
|
0
|
15
|
27
|
8.2
|
Crede, Joe
|
2768
|
2
|
245
|
679
|
24
|
80
|
.975
|
3.00
|
609
|
661
|
52
|
1
|
53
|
43
|
15.5
|
Encarnacion, Edwin
|
2577
|
1
|
240
|
524
|
51
|
47
|
.937
|
2.67
|
530
|
498
|
-32
|
5
|
-27
|
-30
|
-11.6
|
Ensberg, Morgan
|
2846
|
1
|
220
|
661
|
39
|
71
|
.958
|
2.79
|
614
|
632
|
18
|
1
|
19
|
15
|
5.3
|
Feliz, Pedro
|
3184
|
0
|
256
|
777
|
38
|
75
|
.965
|
2.92
|
664
|
717
|
53
|
6
|
59
|
58
|
18.2
|
Figgins, Chone
|
1554
|
2
|
108
|
310
|
26
|
26
|
.941
|
2.42
|
302
|
295
|
-7
|
-2
|
-9
|
-6
|
-3.9
|
Glaus, Troy
|
3367
|
0
|
271
|
778
|
47
|
86
|
.957
|
2.80
|
749
|
746
|
-3
|
6
|
3
|
-1
|
-0.3
|
Gordon, Alex
|
1135
|
0
|
99
|
247
|
14
|
22
|
.961
|
2.74
|
232
|
241
|
9
|
-3
|
6
|
9
|
7.9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inge, Brandon
|
4101
|
1
|
354
|
1101
|
63
|
100
|
.958
|
3.19
|
1040
|
1090
|
50
|
-3
|
47
|
64
|
15.6
|
Iwamura, Akirnoi
|
1042
|
1
|
79
|
197
|
7
|
17
|
.975
|
2.38
|
200
|
194
|
-6
|
-4
|
-10
|
-7
|
-6.7
|
Izturis, Maicer
|
1430
|
0
|
86
|
298
|
24
|
27
|
.941
|
2.42
|
274
|
276
|
2
|
-1
|
1
|
2
|
1.4
|
Jones, Chipper
|
2799
|
1
|
242
|
572
|
32
|
56
|
.962
|
2.62
|
538
|
523
|
-15
|
-2
|
-17
|
-17
|
-6.1
|
Koskie, Corey
|
1277
|
2
|
106
|
308
|
14
|
30
|
.967
|
2.92
|
278
|
292
|
14
|
-1
|
13
|
11
|
8.6
|
Kouzmanoff, Kevin
|
1151
|
1
|
93
|
213
|
23
|
12
|
.930
|
2.39
|
206
|
202
|
-4
|
1
|
-3
|
-5
|
-4.3
|
Lowell, Mike
|
3749
|
1
|
355
|
821
|
27
|
107
|
.978
|
2.82
|
779
|
778
|
-1
|
-1
|
-2
|
6
|
1.6
|
Mora, Melvin
|
3664
|
0
|
275
|
857
|
45
|
59
|
.962
|
2.78
|
821
|
824
|
3
|
3
|
6
|
7
|
1.9
|
Mueller, Billy
|
1465
|
1
|
102
|
325
|
18
|
27
|
.960
|
2.62
|
318
|
321
|
3
|
2
|
5
|
4
|
2.7
|
Nunez, Abraham
|
1946
|
1
|
131
|
529
|
27
|
41
|
.961
|
3.05
|
499
|
501
|
2
|
1
|
3
|
7
|
3.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Punto, Nick
|
1663
|
1
|
142
|
369
|
16
|
31
|
.970
|
2.76
|
330
|
352
|
22
|
1
|
23
|
27
|
16.2
|
Ramirez, Aramis
|
3464
|
2
|
268
|
730
|
39
|
50
|
.962
|
2.59
|
676
|
686
|
10
|
-2
|
8
|
11
|
3.2
|
Randa, Joe
|
1521
|
2
|
157
|
295
|
16
|
28
|
.966
|
2.67
|
307
|
289
|
-18
|
1
|
-17
|
-18
|
-11.8
|
Rodriguez, Alex
|
4002
|
1
|
317
|
800
|
49
|
80
|
.958
|
2.51
|
799
|
792
|
-7
|
2
|
-5
|
-9
|
-2.2
|
Rolen, Scott
|
2636
|
2
|
200
|
695
|
31
|
71
|
.967
|
3.05
|
600
|
653
|
53
|
-1
|
52
|
51
|
19.3
|
Sanchez, Freddy
|
1299
|
1
|
98
|
373
|
10
|
43
|
.979
|
3.26
|
319
|
344
|
25
|
-3
|
22
|
28
|
21.5
|
Teahen, Mark
|
1992
|
0
|
192
|
481
|
34
|
54
|
.952
|
3.04
|
475
|
452
|
-23
|
3
|
-20
|
-33
|
-16.6
|
Tracy, Chad
|
1652
|
0
|
130
|
333
|
29
|
33
|
.941
|
2.52
|
330
|
324
|
-6
|
1
|
-5
|
-8
|
-4.8
|
Wigginton, Ty
|
1226
|
2
|
92
|
260
|
22
|
17
|
.941
|
2.58
|
272
|
240
|
-32
|
0
|
-32
|
-40
|
-32.6
|
Wright, David
|
4188
|
0
|
315
|
949
|
64
|
77
|
.952
|
2.72
|
886
|
882
|
-4
|
-4
|
-8
|
-11
|
-2.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zimmerman, Ryan
|
2911
|
0
|
298
|
637
|
38
|
74
|
.961
|
2.89
|
563
|
584
|
21
|
3
|
24
|
21
|
7.2
|
The method used here was to look for “agreement” or “disagreement” between two methods of comparison. Let us take, for example, the two New York third basemen, Alex Rodriguez and David Wright. Rodriguez has a Range Factor of 2.51, and is rated by the +/- system (enhanced) at –9 plays in 4002.1 innings, which is –2.2 plays per 1000 innings. Wright has a Range Factor of 2.72, and is rated by the +/- system at –11 plays in 4188 innings, which is –2.6 plays per 1000 innings. So, do the two systems agree about which is the better fielder, or do they disagree?
Obviously, they disagree. Range Factor says that Wright is better than Rodriguez (by a relatively small margin), while the +/- system says that Rodriguez is better than Wright (by a very tiny margin.) Our method here is simply to make every possible player-to-player comparison, and to count how many times the two systems agree. With 41 players in the study there are 820 player-to-player comparisons {(41 * 40) ÷ 2}.
C. Simple or Direct Results
Fielding Percentage. Let’s start with Fielding Percentage, since fielding percentage is still the most-commonly cited fielding statistic. Fielding Percentage and the +/- system agreed as to which of two third basemen was a better fielder 551 times in the 820 comparisons, and disagreed the other 269 times. Fielding Percentage thus agreed with the +/- system 67% of the time—a little more than two times in three.
Double Plays to Errors Ratio. It was actually Double Plays to Errors Ratio that started this project. I noticed, looking at the data, that Mike Lowell over the last three years has a double plays to errors ratio of 107 to 27, which, just looking at the data for a few other third basemen, seemed to be quite exceptional. Adrian Beltre, for example, was 81 to 47, Ryan Zimmerman 74 to 38, Aramis Ramirez 50 to 39, Melvin Mora 59 to 45, and several of the less experienced third basemen are actually less than even. I have always advocated Double Plays to Errors ratio as a simple way to get a line on a fielder’s ability, and when I saw that I thought “wouldn’t it be interesting if the DP/Errors ratio, over a period of years, turned out to be a highly reliable indicator of defensive quality vis a vis these new defensive metrics?”
The Double Plays to Errors Ratio may be measuring something that the +/- is actually missing; I’m not sure, and perhaps John can add a post script explaining that. My opinion is that Lowell has an exceptional double plays to errors ratio in part because he has an exceptionally accurate throwing arm. If you think about a third baseman making a throw to start a double play, every inch that the throw is off target probably has a measurable impact on the number of double plays resulting. If the throw to the second baseman covering is three feet off target—let’s say that’s it’s three feet high, or that it pulls the second baseman three feet out toward left field—that almost certainly is going to eliminate the chance of getting a double play, most of the time. If the throw is one foot off target, it certainly would make an easily measurable difference in the chance of turning the double play.
Lowell over the three years of the study has worked with three different primary second basemen—Luis Castillo in 2005, Mark Loretta in 2006, Dustin Pedroia in 2007—so it is not likely that his exceptional Double Play total would result from the actions of the second basemen. On the other hand, it might be dangerous (in normal cases) to evaluate a third baseman by his ability to start double plays, because it might be (in other cases) that the second baseman controls this more than or as much as the third baseman himself. Thus, it might be that this measurement is picking up something that the +/- system can’t really key on.
In any case, the Double Plays/Errors ratio agrees with the +/- system as to which of two third basemen is a better fielder 567 times in 818 comparisons, or 69% of the time, slightly higher than the degree of agreement between Fielding Percentage and Dewan +/-. There were two sets of players who had identical Double Plays to Errors Ratios.
Range Factor. For the 820 possible comparisons there are 597 cases in which Range Factor and Dewan’s +/- agree as to which is the better fielder, and 223 cases in which they disagree. There is 73% agreement between the two—actually 72.8%.
D. Implied Results
This is a study of just 42 players at one position, and nothing can be safely inferred from this data. However, let us assume for the sake of argument that this 72.8% figure would hold up with additional research.
Based on that result, I think we could safely conclude that Range Factors are 75 to 80% accurate in evaluating fielders. I reason as follows:
Two systems can agree upon a result if they are both correct, or if they are both incorrect. Suppose that you have two systems which are both 90% accurate in a comparison of this nature. How often would they agree?
If they are randomly aligned against the truth, they would agree 82% of the time--.9 * .9 when they are both right, and .1 * .1 when they are both wrong. 81% plus 1% = 82%.
If each system is 80% accurate then they would probably agree 68% of the time—64% plus 4%. If one system was 90% accurate and the other was 60% accurate, they could be expected to agree 58% of the time--.9 * .6, plus .1 * .4. We can call this number the agreement rate.
We have been assuming for the sake of argument that Dewan’s +/- system was a perfect test of a fielder’s ability, but of course this cannot be true. There are many distinctions in the data which are razor thin—like the difference between David Wright and Alex Rodriguez—and it is unreasonable to assume that the system gets all of those distinctions right.
Using these assumptions, if a system agrees with the Dewan method 72.8% of the time, the only way that it can be less than 72.8% accurate is if the Dewan system is less than 27.2% accurate. If we assume that the Dewan +/- is 99% accurate, then to agree with that method 72.8% of the time, Range Factors would need to be .7327 accurate:
.99 * .7327 = .7254
.01 * .2673 = .0027
Agreement Rate .7281
If Dewan’s method is 95% accurate, then Range Factors would need to be 75.3% accurate in order for the two to be expected to agree 72.8% of the time. The lower accuracy is assigned to Dewan’s method, the more accuracy must be assigned to Range Factors in order to get the same degree of agreement between the two. This remains true until we assume that Dewan’s method is less than 50% accurate. Below 27.2%, you can get a theoretical agreement factor of .728 or higher by assuming that both methods are wrong a very high percentage of the time.
But that’s an absurd conclusion; no reasonable person could believe that both methods are wrong 75% of the time, nor is it possible to explain how that could happen. Therefore, we have to conclude that the only way for Range Factors to agree with Dewan’s +/- 72.8% of the time is if they are more than 72.8% accurate.
Let us assume, as one border, that Dewan’s +/- system cannot be more than 95% accurate in comparing player to player, since
a) many of the comparisons are close, and
b) there is a measure of randomness in all outcomes.
If we assume that Dewan’s +/- system is 95% accurate or less, this implies that Range Factors are 75% accurate or more.
For a border on the other end, suppose that we assumed that Dewan’s +/- was 80% accurate. In order to get 72.8% agreement with a system that was 80% accurate, Range Factors would have to be 88% accurate.
But it seems entirely unreasonable to believe that Range Factors, a primitive method prone to many different illusions of context, can be 88% accurate if Dewan’s +/-, a sophisticated analysis which attempts to remove every illusion of context, is only 80% accurate. It seems unreasonable to believe that Range Factors are as accurate as the +/-, let alone that they are more accurate.
If we assume that Dewan’s +/- is 88% accurate, that would imply that Range Factors are 80% accurate, in order to get an Agreement Rate of .7280:
.88 * .8000 = .7040
.12 * .2000 = .0240
Agreement Rate .7280
This would seem to be as far as it is reasonable to go in assuming that Range Factor and Dewan’s +/- are of comparable accuracy—88% for Dewan, 80% for Range Factor. Therefore, we must conclude that Range Factors are not more than 80% accurate in comparing the relative strength of two fielders.
Range Factors appear to be 75 to 80% accurate in evaluating the relative abilities of two fielders.
By the same logic, Fielding Percentages would appear to be 69 to 73% accurate in evaluating the relative abilities of two fielders, and the Double Plays to Errors Ratio would appear to be 72 to 75% accurate.
E. Housekeeping
Again, I will acknowledge that one study of the players at one position is not a reliable test of anything. My purpose here was more to outline a pathway for future research than to stake a claim for any particular results.
Also, the reliability of the various methods here is being tested on multi-year data, which must be assumed to be substantially more accurate than single-season data. If Range Factors are 75 to 80% accurate in a three-year look, this would suggest that they are something less than 75 to 80% accurate in a single-season study.
Also, the tests here were conducted by comparing the Range Factors to the Enhanced +/- rating, not to the raw +/- rating. The reason this was done is that the agreement between Range Factors and the Enhanced +/- was slightly higher than the agreeement with the raw +/-. I tested it both ways and used the higher figure, although the figure for the raw +/- was just a little bit lower. Whereas the count was 597-223 for agreement between Range Factor and the Enhanced +/-, it was 593-227 for agreement with the “un-enhanced” or “raw” +/-.