Component Theory

June 18, 2020
                                                      Component Theory

 

            Having made a better estimate of the runs saved by each team, we can now move on to the business of attributing those runs saved to "components" of the game, which is a step toward attributing them to individual players.   Here again, I have made many changes to the method outlined before, changes that seemed obvious as I read through the work I had done earlier, and put aside for a few weeks.

            The BIGGEST mistake that I was making, other than using a bad estimate of Team Runs Saved, is that I was frequently, and in different ways, doing something that I sometimes criticize other researchers for doing.  I was confusing measurement with evaluation.  This is measurement.  I am simply trying to measure the number of runs saved by each pitcher and each fielder.  Once I have the measurement, we can use that to make evaluations, but we’re not there yet.  

            Evaluation depends upon context, but measurement is context-neutral, or more context-neutral.   What I am trying to do here is parallel to what I did in the 1970s, when I developed Runs Created. 

            Suppose that you have two players with identical batting stats; let’s say that they each hit .270 with 12 homers, 138 hits in 511 at bats, same number of doubles, triples, home runs and walks drawn; the same number of everything—but one player did this in 1968, whereas the other one did it in the year 2000, the heart of the steroid era.   Is the number of runs created by the two players the same, or is it different?

            The number of runs they have created is assumed by the runs created method to be the same.  Of course there are minor contextual differences in the number of runs that can be expected to result from a single, a double, a triple, minor variations over time, but not REALLY; the run impact of each event does not change significantly between eras.   Of course there is a difference in the value of the two players; the player who creates 70 runs in 1968 has more value than the player who creates 70 runs in 2000, because it takes more runs to win a game in 2000.   But that’s evaluation.   This is just measurement.

            The same with a pitcher, or with a fielder.  A pitcher who pitches 200 innings, strikes out 140 batters, walks 70, gives up 16 home runs. . . .the pitcher’s run contribution is the same, whether he does this in 1968 or in 2000, just as the batter’s is the same.  Of course, his VALUE is different, and we will adjust for that.  

            Well, we’ve adjusted for it already, because the team’s Runs Saved is different in a different Run Context.   We’ll adjust for the context when we get to the evaluation cycle.  

            But before, I was adjusting RUNS SAVED for context, and also adjusting EACH ELEMENT for context, as we went along—thus, I was, in essence, adjusting for the same thing twice.  

            Try to explain it this way:  WE WILL GET TO CONTEXT.   I promise you, we will not lose sight of the context in which performances occur here.  But what I am going to try to do now is put everybody on the same playing field.    I’ll explain more as we go along. 

 

            In the earlier work, I measured each team’s runs saved in twelve categories, which I had simplified to eleven by combining Stolen Bases Allowed and Caught Stealing into one number.  For the moment, I am going to undo that,  moving us back to twelve categories, and then reduce those 12 categories down to seven.

            Actually, we’re going to have to reduce 13 categories down to seven.  In the work I did before, I don’t think I dealt with outfield assists.  I don’t know why; maybe I just spaced out on them, or maybe I thought they were too small to worry about and I would work them in somewhere later.  Anyway, the 7 categories that we will deal with now are:

1) Strikeouts,

2) Walks and Hit Batsmen,

3) One Base Advancement Events, which are Wild Pitches, Balks, Passed Balls and Stolen Bases,

4) Hits Allowed on Balls in Play,

5) Home Runs Allowed,

6) Negative Baserunner Events, which are Double Plays, Outfield Assists and Runners Caught Stealing, and

7) Errors.

 

We’re working with 13 categories there; we’re just counting it as seven by combining events which have the same or very similar effects.   We’ll take them back apart, of course, later on.

Two of those categories, Categories 1 and 6, are positives for the defense.   In my next effort to evaluate each category, I will consider ALL strikeouts to be run-prevention events, rather than strikeouts above some minimum, and I will consider all runners caught stealing to be run-prevention events.  Double plays will essentially all be positive events, as well, except that we have to normalize them for context before we can do anything with them.   It’s a different problem.   Double Plays are now like the only thing that we are neutralizing the context of before we estimate the runs saved. 

The two "positive" categories, the two "High Number is Better" categories, 1 and 6, are counted against zero, whereas the "negative" categories, the Low-Number-is-Better categories, are all except one measured against a standard of five standard deviations worse than the average, the average being the average across all 120 years of the study.  So there is a question there:  is the strikeout pitcher being given an advantage, because ALL of his strikeouts count, whereas the negative categories are counted against a limit?

If anything, the strikeout pitcher is working at a DIS-advantage in this way.   The average pitcher, over time, has struck out .136 of the batters he has faced.  The standard deviation on the team level is .043.  The average pitcher is only 3.15 standard deviations from zero.  The strikeout pitcher, being measured against zero, is being measured against a standard which only 3.15 standard deviations below the norm, whereas most of the other categories are being measured against 5.00 standard deviations below the norm.  (Home Runs, against 4.00 standard deviations above the norm.) 

And the same with the other "positive, high-number is good" category, which is adjusted double plays plus runners caught stealing.    Those, again, are measured against zero, but again, zero is only 3.47 standard deviations below the norm.   But there isn’t really anything I can do about that.   We’re trying to make a symmetrical measurement of an asymmetrical universe.  A pitcher gives up one hit an inning, on average, more or less.  You can’t give up less than zero hits an inning, but you can certainly give up more than two hits in an inning.    It is the nature of raw stats that they are zero-bounded on one side, but unlimited on the up side.  Zero competence has different consequences in different areas.  

With every strikeout counting and strikeout totals going up and up, the pitcher in modern baseball is credited with many more runs prevented by way of the strikeout than was the pitcher in, let’s say, 1928.  But suppose that you compare a pitcher in 1928 and a pitcher in 2018, but not a pitcher with identical stats, but rather, a pitcher with the same impact, but each representing the fashion of his era.   The pitcher in 2018 might have 240 strikeouts in 200 innings, whereas the equally good pitcher in 1928 might have 60 strikeouts in 200 innings.   If the pitcher in 2018 has 180 more strikeouts, and if each strikeout is given a run value of .25 runs, then the pitcher in 2018 will be credited with preventing 45 more runs, even though the higher strikeout total is in part a creation of the game that he is playing, in which every hitter is trying to hit a moonshot.  

BUT.  The 200-inning pitcher in 1928 might have given up 8-10 home runs all year.  Bob Smith, 1928, pitched 244 innings, gave up 11 home runs.   He was 13-17, 3.87 ERA, gave up 11 home runs.    The comparable pitcher in 2018 might be Chase Anderson of Milwaukee, 3.93 ERA.   He struck out 128 batters in 158 innings.  In 1928 he would have had a higher strikeout rate than Lefty Grove, who led the American League in strikeouts.

            But he also gave up 30 home runs in 158 innings.   It’s maybe, what, 22, 23 homers more than Bob Smith, pro-rated per inning?   If each home run has a run cost of 1.4 runs and Smith gets credit for NOT allowing home runs, they come out somewhere about even.    

 

            At least, that’s the way I am going to TRY to make this work now.   I’m still working on it; we’ll see how it goes. 

 

 
 

COMMENTS (12 Comments, most recent shown first)

TheRicemanCometh
Bill, are you using 1-year park factors? I guess that would make the most sense considering what you're trying to do.
3:09 PM Jun 21st
 
FrankD
another comment I reposted:

Another example (although probably too late for anybody to read) ....

Your team ('system') has a way of detecting what the next pitch will be. They beat on a garbage can to tell the batter. The batter hits it over the fence. Is this just the batter's HR, not to be credited a little for the 'system'? And, if not, then why are members of the 'system' banned for a while?

sorry if you saw these as repeats, and I'll shut up about this now, probably to the relief of any who read these (haha) .....
6:32 PM Jun 19th
 
FrankD
I posted this on an earlier paper but I think we don't go back to see those threads ... anyway, here it is:

I'm sure I'm not explaining myself properly about the 'system' and defense. What I'm not concerned is about training. I am concerned about in game decisions and how players are used. Maybe this can't be separated from individual performance (the data we have). In the BJHBA there is a discussion on intelligence and the decision of where to position before the ball is hit. A smart player or smart 'system' will have the player play in the (hopefully) optimal position. But what if the 'system' orders the player to a non-optimal position: that will show up in the data but it is not the players fault, nor is it necessary to reward the player for the 'system' putting him in the best slot. It is not an effect if we assume all teams (systems) play each player in the optimal position. This effect, if largest enough, may be detectable in defensive performances when players switch teams ....




6:29 PM Jun 19th
 
FrankD
like the explanation of measurement vs evaluation... measurement is the data, evaluation is the interpretation of the data. I like this approach in that some things that have varied through time, like better fields and bigger gloves, should be corrected or accounted for when the data in interpreted - these factors will just fall-out when the data is normalized for a more context free environment ....
5:28 PM Jun 19th
 
bjames
Jeffsol--

There is too little data. Much of that data now COULD be recovered from baseball history by working with the Retrosheet data. But until that is done, it's just not really there.



Hotstatrat

My apologies for the earlier response, which was inappropriate.

I think that I can explain this in a way that makes sense. I'll post that article on Tuesday. . .something like "Why we need Runs Saved Against Zero". Haven't written it yet, bu tI will get to it.

10:47 AM Jun 19th
 
jeffsol
Is there a reason we track balks but not baserunner events like first to third Or scoring from second on a single? If baserunnimg events are in the best versions of Runs Created, why wouldn’t they be here? Or is it just that this isn’t available for too much of history?
11:32 PM Jun 18th
 
hotstatrat
hotstatrat: There are teams in history, I assume, that did perform below replacement level. They shouldn't be the teams that set the bar. They should be recognized as having negative value. I presume, Bill, that you agree and will end up there.
Bill: When I started doing sabermetrics, I was confronted by hundreds of people who already understood baseball statistics, and didn't need to see any more. It is nice to know that you're still around.


I didn't mean that as the goal of your research. I was trying to get to the meaning of zero defensive value and compare it to replacement level by using a really bad team as an example. My assumption is that a team at 2 x league average runs scored (2xLARS) would be worse than replacement level and that it should have negative value (but, for now, positive measurement). In my unclear way, I was asking if you agree that it should - or, perhaps, don't care to make that assumption, yet, if ever.
9:46 PM Jun 18th
 
bjames
There are teams in history, I assume, that did perform below replacement level. They shouldn't be the teams that set the bar. They should be recognized as having negative value. I presume, Bill, that you agree and will end up there.



When I started doing sabermetrics, I was confronted by hundreds of people who already understood baseball statistics, and didn't need to see any more. It is nice to know that you're still around.
5:49 PM Jun 18th
 
bjames
hotstatrat
So, zero defensive value is the point at which we can find a defensive value that is a mirror of offensive value - as measured by runs created? Put another way, we are trying to measure runs prevented in such a way that a league's runs prevented is equal to its runs created?


Not runs created, no. Runs Scored. Runs Created are supposed to match Runs Scored, so it amounts to the same thing, but we are trying to match Runs Scored, not Runs Created.
5:10 PM Jun 18th
 
hotstatrat
So, zero defensive value is the point at which we can find a defensive value that is a mirror of offensive value - as measured by runs created? Put another way, we are trying to measure runs prevented in such a way that a league's runs prevented is equal to its runs created?

I think where evanecurb and others (like me) are groping with is that, OK, this will be interesting, but nowadays we can't unthink about the replacement level concept. You do explain here that were are measuring things, not evaluating. So, I guess we are impatient. There is no "value" in giving up 7 or more runs per game, but there may be some individuals on the team and/or some components of those teams (fielding, strikeouts, HR, etc.) that might well have some value.

There are teams in history, I assume, that did perform below replacement level. They shouldn't be the teams that set the bar. They should be recognized as having negative value. I presume, Bill, that you agree and will end up there.
11:34 AM Jun 18th
 
bjames
Responding to evanecurb
I'm trying to follow along here; not getting everything you're saying. That's my problem, not yours (the fact that I don't get everything). Are the 5.00 standard deviations below norm an estimate of replacement level?


Not replacement level, no. Replacement level filters out performance below a winning percentage of .294--thus, filters out 50 games per season from each team. This level of performance filters out nothing. It's zero value, not replacement level.
10:36 AM Jun 18th
 
evanecurb
I'm trying to follow along here; not getting everything you're saying. That's my problem, not yours (the fact that I don't get everything). Are the 5.00 standard deviations below norm an estimate of replacement level?
10:12 AM Jun 18th
 
 
©2020 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy