Remember me

The Common Scale

November 28, 2007

            This is not an essay about Range Factors, or about Jimmy Rollins, or about myself, or about the MVP Award, or about the way sportswriters argue; I am merely setting the stage.

            1)  Jimmy Rollins won the MVP Award,

            2)  Somebody attacked this by pointing out that his Range Factor was average, according to our Handbook,

            3)  Somebody else, defending the vote for Rollins, pointed out the “flaws” in the concept of Range Factor,

            4)  Somebody else fired a shot at me for proposing, or publishing, or inventing Range Factors, or something. 

            I suspect my reputation may survive this.   There are several issues in the foreground there that I want no part of, at least at the moment; if I get time I may take a hard look at the NL MVP race.  What I wanted to get to is an issue in the background, which is:  the relative accuracy of various stats.   It is Range Factors that occasion this dispute, so let’s start with them.  Certainly there are many limitations to the accuracy of Range Factors, among them:

            1)  Some pitching staffs get more strikeouts than others, thus have fewer balls in play,

            2)  Some pitching staffs have more ground ball pitchers than others, even if the strikeouts are even,

            3)  Some pitching staffs have more left-handers than others, which causes more balls to be hit to that side of the infield,

            4)  Some teams have more runners on base against them, which gives the shortstop more putouts.   If the team’s pitchers walk more people, for example, the putouts by the shortstop and second baseman will increase because there are more plays at second base.

            5)  Some teams have more good defensive players competing for the same balls in play.

            6)  Some features of defensive excellence are not measured by range.

            And many others. . .I can’t think of any others right now, but let’s assume there are many others.   Still, Range Factors must have some level of accuracy, do they not?   All other factors being equal, a player with more range would make more plays than a player with less range, it would seem to me, therefore would have a higher Range Factor.   The measure does not bear a completely random relationship to that which it attempts to measure.

            The person who attacks Range Factors in this context might counter by citing, let us say, Matt Holliday’s RBI count, or Cole Hamels’ Won-Lost record, or how many hit records Roy Orbison had; MVP discussions tend to be free-ranging.   But how many extraneous factors influence the RBI count?   Let’s see:

            1)  The on-base skills of the batters in front of him,

            2)  The speed of the batters in front of him,

            3)  His position in the batting order (a #4 hitter has more RBI opportunities than a #2 hitter, even if the #2 hitter were to have the same hitters coming up ahead of him),

            4)  The park effects,

            5)  The hitters coming up AFTER this hitter in the lineup, which effect, at a minimum, how many times he is intentionally walked,

            6)  The tendency of the manager to use the bunt or the steal to get runners into scoring position for this hitter.

            And many others. . .let’s assume there are many others.   Certainly I would not suggest that Jimmy Rollins would have a 4.41 Range Factor if he was playing shortstop for Colorado, but on the other hand, you would not suggest that Matt Holliday would drive in 134 runs if he was hitting leadoff for Philadelphia, would you?  The question I would pose is, why is it OK to point out that Matt Holliday has 134 RBI, but not OK to point out that Jimmy Rollins has a 4.41 Range Factor?  

But actually, that’s not the question I was trying to get to, either; that’s still in the foreground. The question I am trying to get to is:  Is there some way to actually state the accuracy of different statistics along a common scale, so that one may be cited in comparison to another?

            There are NO perfectly accurate stats; there are problems with all of them.  Even the stats that we spend years trying to perfect are still liable to biases and oversights.  Some are more accurate than others.  The discussion suffers, it seems to me, because there is no common reference to say how accurate any stat is.  This enables people to dismiss stats that they don' t like by citing some problem or other, while treating other stats--which may be half as reliable--as if they were without sin.

                  This is a reader-participation forum, and I am not posing a rhetorical question here; I am sincerely asking you to tell me what you think.  Is there some way that we could organize a discussion of the relative accuracy of different metrics, such that, at the culmination of that discussion, we could say that “David Eckstein had a higher Range Factor than Jimmie Rollins, a stat that is 41% accurate in measuring a fielder’s range,” or “Matt Holliday drove in more runs than Scott Podsednik, a stat that is 11% accurate in measuring a hitter’s actual ability to drive in runs.”

            Of course, the argument wouldn’t be stated that way exactly; the facts about the relative accuracy of different stats would emerge in one venue, and the MVP argument would proceed in another venue, but still, the information would transfer because the same people would read both articles while listening to Roy Orbison music.  Also, of course, the reliability of the stat would depend on the degree of separation.  A Range Factor advantage of 5.00 to 4.00 would have a higher degree of reliability than a Range Factor advantage of 4.52 to 4.51.  These are foreground issues; I am still trying to get to a background question.   Is there some way to actually state the accuracy of different statistics along a meaningful common scale?

            I don’t know the answer to that question; that’s why I need your help.  But here are five thoughts about the issue.

            1)  An accuracy scale should run from 0 to 100%.   This is less obvious than it seems.   An entirely meaningless stat is still going to point to the better player 50% of the time if you compare two players—thus, we could develop a scale that runs from 50 to 100.   We shouldn’t.  

            Also, some stats which were relied on in the past have been found to be negative indicators, so these could be scored as having negative accuracy.   But that should not be done, because if a stat indicates something negatively, that is still an indication.  There is really no such thing as “negative reliability”.

            2)  The general accuracy of a stat should be stated at an interval of one standard deviation.    In other words, it is useful to be able to say that “the accuracy of batting average as an indicator of overall batting ability is 76%”, although obviously the reliability is greater when comparing Ichiro Suzuki to Eric Munson than it is when comparing one of the Molina brothers to another one.  This difficulty can be overcome by using the standard deviation as a reference point.   If the standard deviation of batting average is 28 points, then what it means to say that batting average is 76% reliable is that it is 76% accurate at an interval of 28 points.

            3)  The accuracy of any stat can only be stated with respect to a stated object, although in some cases that stated object could be assumed.   Range factors, for example, might be 70% accurate as a measure of defensive range at a position, but only 32% accurate as a measure of overall defensive ability.  

            We would thus have to agree on what each stat is supposedly measuring.  Are RBI supposed to be a measure of overall hitting skill, or the ability to create runs, or merely of the ability to drive in runs?   Is winning percentage supposed to be a measure of the pitcher’s ability to win, or of something more limited and specific?

            4)  The accuracy of some stats can be assessed very quickly, if we can agree to a scale and agree to a stated object.   It would be relatively easy to calculate how reliable batting average was as a predictor of some other skill, if we agree as to what the object was and what the operational meaning of the scale was.

            5)  Even things like Range Factor could be assessed pretty accurately by the use of models.    Actually it would be a fun new project for model-builders, I think. . ..put a player with “known” range on one team, on another, check to see how much his range factor changes with the conditions of the model.   Would be an interesting project.

            Gotta go. . .Roy’s going to do “Pretty Woman”.

 
 

COMMENTS (7 Comments, most recent shown first)

prezzpac
studes mentioned this, but I think it's worth reiterating. It's not an easy task at all to say what some of these stats are trying to measure, unless we resort to banalities. I would argue that RBIs measures only number of runs batted in, for which it is 100% accurate. I have no idea what else it could be measuring, or how you would know.
Similarly, I'm not sure Range Factor really measures anything except the number of put-outs and assists per 9 innings. Maybe I'm being obtuse here.
1:35 PM Mar 22nd
 
wovenstrap
Didn't you do this already? I distinctly remember an Abstract-era article assessing statistics in which there were three criteria from 1 to 10 and multiplying them together gave a range from 0 to 1000.
7:15 PM Mar 7th
 
rpriske
I realize this isn't what you are getting at, but isn't what you really need some sort of measurement that can equalize the different parts of the game so that we can see how much value the player has contributed to his team winning?

If only there was such a thing...
1:00 PM Feb 22nd
 
studes
I agree that regression from the mean is a good tool. It basically addresses the sample size issue. But that's only one of the issues being raised here, right? What about Range Factor or RBI's, and the impact of external factors. I understand that regression to the mean can address those factors to some extent, but how well? And, fundamentally, can we even agree what some of these stats are intended to measure? What does RBI measure, anyway?
7:27 PM Feb 21st
 
tangotiger
And here's the pitching thresholds:
http://mvn.com/mlb-stats/2008/01/06/on-the-reliability-of-pitching-stats/
2:46 PM Jan 9th
 
tangotiger
I gave you the wrong link. This is the correct one:
http://mvn.com/mlb-stats/2007/11/14/525600-minutes-how-do-you-measure-a-player-in-a-year/
11:31 AM Nov 30th
 
tangotiger
Bill,

From a technical standpoint, I always set the reliability of a metric based on how much regression toward the mean is required. For example, a player with around 200 PA will have his OBP regressed 50% toward the mean. Put another way, if you take a few hundred player's 200 PA and correlate that to the same player's next 200 PA, you'll get, roughly, r=.50.

For things like K/PA, or GB/PA, the r=.50 level occurs at a much lower level than 200 PA. For things like reached base on error per PA, the r=.50 level occurs much higher. (If things were really random, you'd never get an r=.50, much less r=.001.)

Where Range Factor falls, I don't know. But, it's easy enough to test.

Along these same lines:
http://mvn.com/mlb-stats/2007/11/27/stats-204-the-proximity-matrix-or-re-visioning-similarity-scores/
Pizza Cutter does the same thing (but he looks for it at the r=.71 level, which is r-squared=.50). I highly suggest you reading it.

The real power here comes from being able to regress toward the mean anything, once you know the PA level at r=.50.

r=PA/(PA+x)

If r=.50 at PA=200, then x=200.

So, r=PA/(PA+200)

If you have 1800 PA, then r=.90, and regression toward the mean is 1-r=.10. That is, if you have 1800 PA, then your player's OBP in the future will be 90% of his OBP in those 1800 PA, and 10% of the league average.

Tom

9:57 AM Nov 30th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy