By John Dewan

December 31, 2014

Defensive analytics have grown in leaps and bounds in the last decade. At Baseball Info Solutions (BIS), we eat, sleep and breathe defense, but there is always more to learn. A recent research project uncovered some remarkable new information.

One of the public perceptions has been that a player needs three full seasons before his defensive metrics provide a true indication of his defensive abilities. That has been my own personal rule of thumb, though I’ve known there is some reliability to sample sizes smaller than three years.

Based on the new research, BIS has found that Defensive Runs Saved based on as small a sample size as 350 innings in the field (about a quarter of the season) produces reliable results. This is a very significant finding.

The research produced another significant finding. Defensive Runs Saved is a better predictor than many other statistical measures in baseball even over limited samples. Most notably, DRS is a better predictor of future performance than batting average and OPS with partial season data.

We’ll have more on this in the upcoming book, *The Fielding Bible—Volume IV*, but here is a table that summarizes the results. We use the statistic called the correlation coefficient to show how predictive each statistic is—it produces a number between -1 and 1, with numbers near zero meaning no predictability and numbers near -1 and 1 meaning high predictability.

Correlation Coefficients of AVG, OPS, and DRS |
||

Statistic |
350 Innings |
700 Innings |

Batting Average | 0.46 | 0.47 |

OPS | 0.52 | 0.51 |

DRS | 0.55 | 0.59 |

As you can see from the table, DRS is more predictive than batting average and OPS after just 350 innings. The same is true if you increase the samples to 700 innings.

In the study, we ran correlations of three years of defensive data versus the subsequent year's DRS totals for position players. The first used 350 innings for DRS and 175 at-bats for batting average and OPS—both about one fourth of an MLB season—over both samples. The second used 700 innings and 350 at-bats. The full explanation of the study of the predictive power of Defensive Runs Saved as well as the rest of our latest defensive research can be found in the upcoming *Fielding Bible—Volume IV*, which will be released in early spring of 2015.

©2021 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy

## COMMENTS (7 Comments, most recent shown first)

jrickertdoncoffin, sorry for the delay (I got caught up in classes)........ The slope of the regression line is b=ssxy/ssxx

the correlation coefficent is r=ssxy/sqrt(ssxxssyy) which is

r=b*sqrt(ssxx/ssyy) .............

if we use units in which the standard deviations are equal to 1 then r=b. ............

If ssxx=ssyy, then r=b. .............

I would be curious to see what values you got for ssxy,ssxx,ssyy,r, and b for the three regressions. (if you prefer to email, just google my name with math and baseball to find my Rose-Hulman page)

2:18 PM Jan 24thtdunnInteresting data. Please clarify what those three variables predict. Is it games won?

4:31 PM Jan 2nddoncoffinjrickert--I've been running regressions for 35 years (that after taking a graduate sequence in econometrics), and I have never read of, or even heard someone espouse, that relationship. I have, by the way, constructed a data set relating a period 1 measure to a period 2 measure, with 3 period 2 measures. All 3 have correlation coefficients between 0.55 and 0.62. But the "b" coefficient in a regression

V2 = a +B*V1 + e

is very different for all three regressions. (I should note that the means of V1 and of the 3 V2s are identical (=5.0) and the standard deviations of V1 and the 3 V2s, while not identical, are all between 7 and 10.

So I would need something of a more complete explanation of what you're saying before I could accept it.

3:24 PM Jan 2ndtangotigerMy response:

tangotiger.com/index.php/site/article/are-defensive-runs-saved-predictive

9:22 AM Jan 1stcderosaHi John,

As someone who bought the first three Fielding Bibles, I’ll take your mention of a fourth as the occasion to offer some feedback. First of all, I think every edition of the book has improved on the last, so credit where it is due on that! Something really lacking in the last two editions, though, was a serious explanation of the Good Fielding Plays and Defensive Mistakes. The 2nd edition introduced the concept, and the 3rd edition incorporated it into Defensive Runs Saved. But as of yet, I haven’t seen published anywhere a list of all the 82 Good Plays and Mistakes being counted and their exact definitions. This is crucial information for the reader who wants to understand what is being measured. If it is too late for inclusion in the book, I would urge you to publish it on this site.

Happy new year, Chris DeRosa

8:28 AM Jan 1stjrickertdoncoffin, the correlation coefficient is the slope of the regression line when using what I refer to as statistical units. More technically, if one unit is equal to one standard deviation - rescaled for each variable - then the slope of the regression line is the same as the correlation coefficient. x x x x x In this case the standard deviations are probably close to the same size (as each other) so the slope of the regression line is (probably) roughly equal to the correlation coefficient. x x x x x if we assume that differences in the standard deviation are statistical noise then any difference between the correlation coefficient and the slope of the regression line are a result of that same noise.

11:33 PM Dec 31stdoncoffinOff the top, I have to think that the differences in those correlation coefficients isn't significant. If they persist as your sample size gets larger, then maybe.

But there's another issue. Correlation is nice. But. It would be interesting to see a scatterplot of the data, with a regression line. Specifically, how steeply sloped is the regression line between (presumable) last period's DRS and this period's? Specifically, is the (presumable positive, since r is positive) coefficient on last year's DRS close to 1? It's possible to have a correlation of .56 (or .8, or whatever) with a regression coefficient of almost any (again, in this case positive) magnitude.

7:14 PM Dec 31st