By Bill James

May 28, 2008

In an article posted here a few days ago, I discussed the issues of whether and how we should count “RBI Opportunities”. My proposal was to count RBI opportunities as the sum of actual RBI and missed RBI opportunities, with missed RBI opportunities being defined as:

1.00 for each runner left on third base with less than two out, plus

.70 for a runner left on second base or left on third base with two out, plus

.40 for a runner left on first base.

With no missed RBI opportunities counted, however, when there was no out made. I asked readers to comment on that proposal, and the purpose of this article is to carry forward the discussion by responding to your comments.

The first responder, mskarpelos, said that he liked the general approach, but that “I suggest using adjustments based on the famous 24-node finite state Markov Chain.” I tend to regard the 24 states analysis sort of the way I regard music at the ballpark: I know that we need this, but I’m not sure that we need quite so much of it.

Statistics, like all other products, must serve the needs of the consumers as well as the inclinations of the statisticians. The public may eventually get the general idea of a 24-states analysis, but you’re asking an awful lot, to ask the public to sit through calculations of that length in order to get to something as simple as a count of RBI opportunities.

I have invented many stats that are now in common usage, and I have invented many more that are or have been forgotten. It is my view that, for a stat to have a *chance*

to succeed, we have to be able to explain it to people whose patience with statistical analysis is extremely limited. We have to shoot for the day on which a radio announcer can say casually “

So I think what you’re really saying.. ..I know you don’t mean it this way. . .but I think what you’re really saying is “Screw the consumer; let’s just make it up the way we like it.” But if you do that, you very quickly become irrelevant to the general discussion. I have worked hard to *be* relevant.

The same general issue arises in respect to Tangotiger’s suggestion that the opportunity loss from a batter failure with a runner on third, no out, is actually much less than the opportunity loss from a failure with a runner on third with two out, thus that the grouping should not be the 1^{st} and 2^{nd} outs of the inning against the 3rd, but rather, the 1^{st} and 3^{rd} outs against the second.

I very much appreciate Tango’s research on the issue of how my outline relates to the run probabilities chart, and I appreciate the generally supportive comments. But that said. . .aren’t you kind of looking at the issue from the wrong angle here? Measuring changes in run states is a very useful exercise, but it doesn’t happen to be *exactly* what we’re doing right now. *Exactly* what we’re doing right now is measuring missed RBI opportunities.

A batter who strikes out or pops out with a runner on third and no one out has missed a very easy RBI opportunity—the same as he has if he strikes out or pops out with one out. True, the “damage” to his team, by the Markov Chain, may be greater with one out—just as one RBI may be of much more value than another RBI--but what the batter has done is the same. We don’t adjust RBI counts for the value of each RBI, at least while we are in the process of simply counting RBI. Does it really make sense to try to adjust **Missed** RBI for the value of the RBI that are missed?

Setting that issue aside, I go back to the earlier problem. I visualize myself on the radio, trying to explain this concept to some afternoon radio guy in

The suggestion to park-effect the stat, again, is a suggestion to complicate the computation to such an extent that no one will ever understand what we’re doing. I will also point out that this is contrary to standard statistical practice. Everything is related to park effects—including, for example, RBI themselves. But when counting RBI, we don’t count an RBI as .80 RBI in Coors Field and 1.15 in Shea Stadium—nor should we. We count them first, then we adjust for park effects. Same thing here: count first, adjust later.

Trailbzr asks “What is the purpose of the stat?”, to which the answer is “to measure each hitter’s RBI opportunities.” Sorry. . .I thought that was clear.

It is worth making the point: it is *not* the purpose of this stat to measure how good a hitter someone is, or to measure what his value is compared to another hitter. We are simply trying to put the RBI stat in the context of RBI opportunities—that’s all.

The discussion about Mike Schmidt seems to be leap-frogging the research, and I probably shouldn’t comment on that.

Martin suggests that “I suspect that this will go over well with the statheads and almost nobody else. Remember how much people despised GWRBIs? Well, they’ll view this the same way.”

This is just my opinion, but I think that kind of defeatist thinking is fantastically wrongheaded. First of all, I am very puzzled by how you can connect this to GWRBI, or why you would do this. Out of a universe of billion failed stats and a few hundred successful ones which have been introduced since then, why did you choose to link RBI opportunities to a stat with which it seems to have virtually nothing in common?

Game Winning RBI failed for the best possible reason: it was horribly designed. It richly deserved to fail. Most new stats fail—including mine, many of which, in retrospect, have obvious design flaws--and people who work with me know that I am perpetually over-optimistic about the chances of making a new stat work. We’ll see, I guess. My feeling is that it *can *succeed, if we do enough things right.

Gregg Borgeson pointed out two missing elements in the stat: 1, that we shouldn’t charge an RBI opportunity for a successful sacrifice bunt, since the bat has been taken out of the batters hands, and 2, that we should do something with double play balls. He is correct on both points. . .he had a suggestion for what to do with GIDP, which I have *exactly* adopted, but there is a need to do something there.

Ralph C. (Cramden?) asked what happens if multiple players are stranded. We just add up the totals. . .if you bat with the bases loaded, nobody out, and strike out, that counts as 2.1 missed RBI—1.00 for the runner on third, .70 for the runner on second, .40 for the runner on first.

OK, there’s one more issue here: RBI opportunities when somebody could hit a home run. My proposal before ignored these. I think there are four reasons not to ignore them:

1) Parallel construction. If a player gets an RBI for a solo home run, why is there no “opportunity” for an RBI if the bases are empty?

2) Mathematical consistency. We had this:

Runner on Third 1.00

Runner on Second .70

Runner on First .40

Doesn’t this fit kind of perfectly:

Runner on Third 1.00

Runner on Second .70

Runner on First .40

Runner at Home .10

3) Suppose that you compare these two players:

__AB H 2B 3B HR BB SO RBI __

400 136 32 4 24 70 50 90

500 136 32 4 24 40 100 90

In that case, are these two players the *same* as RBI men, or are they different? Were their RBI opportunities truly the same, or were they different?

They were different. The player who had 100 more at bats had 100 more chances to do something. It seems to me that it is inappropriate to entirley ignore those—even in the extreme and unusual case, where the second player makes all of the marginal outs with the bases empty.

4) With all due respect to the people who wanted to drag the Markov chain into this, it seems to me that that’s really not the relevant “balance” that we should be looking for. What we should looking for, it seems to me, is *to balance the numbers so that an average hitter has essentially the same ratio of RBI to missed RBI in each situation*. I don’t KNOW, but I would assume we’re pretty close to that. That was my intent, anyway.

Suppose that the second hitter above, with 500 at bats. . ..let’s assume that he is kind of an average hitter, and let’s assume that his plate appearances are split:

135 with men on second

135 with men on first

135 with the bases empty

Unrealistic, of course, but for illustration, and let’s assume that his performance is the same in each group—34 hits, 8 doubles, a triple, 6 homers, 10 walks.

With men on third, the player would probably drive in about 49 runs (one for each hit, plus 6 for the homers, plus about 7 that might score on fly balls and ground balls.) He would probably make about 84 outs that didn’t produce and RBI, for which he would be charged with 76 missed RBI (assuming that two-thirds of these outs would be the first or second out.) Thus, his expected RBI percentage would be between .350 and .400—49 for 125, more or less.

With runners on second he probably would drive in about 36 runs (12 on the homers, one on the triple, 8 on the doubles, and about 15 on the singles) while making 91 outs that didn’t produce a run, for which we would charge him about 63.7 missed chances. Thus, his expected RBI percentage would be in the same general range—about .400 (36 for 99.7. You get different numbers when you estimate for different types of hitters.)

With runners on first he probably would drive in about 19 runs (12 on the homers, one on the triple, 6 on the doubles) while making the same 91 outs that didn’t produce a run, for which we would charge him 36.4 missed chances. Thus, his expected RBI percentage would be in the same range—about .350 (19 for 55.4).

But with the bases empty, if we don’t charge anything for outs there, he would have 6 RBI vs. no missed RBI—a percentage of 1.000. That doesn’t seem right, and it causes bases-empty opportunities to distort the totals. If you charge him .10 missed RBI for each of his 91 outs, he’s back in the same range—about .400 (6 for 15.1). That seems to me to be better.

OK, this is what I’m going to do. . .and thank you all for your input. We’re going to figure RBI opportunities in this way:

2) Missed RBI Opportunities are tallied as follows:

1.00 for a runner left on third base with less than two out **or** for grounding

into a double play,

.70 for a runner left on second base or for a runner left on third base

with two out,

.40 for a runner left on first base,

.10 for a bases-empty out, HOWEVER

No Missed RBI Opportunities are charged when the batter does not make an out or hit into a forceout, and

No

If a player leaves multiple runners on base he is charged with each missed opportunity. If a player Grounds into a Double Play with other runners on base, he is also responsible for the other missed opportunities. He is *not* charged with a missed RBI opportunity, however, for a runner who scores from third on the Double Play—no RBI, but no missed RBI for that runner, since that runner has scored.

Thanks again.

©2017 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy

## COMMENTS (20 Comments, most recent shown first)

bbmarksI think the 0.4 penalty for a runner on first is perhaps too high and discrimitates against singles hitters, because if they hit a single and advance the runner two bases, they get nothing but if they make an out they get a steep penalty. Suppose you have Wade Boggs vs. Dave Kingman and each have 100 PA with a runner on first. Then suppose Boggs gets 30 1B and 20 walks and Kingman has 8 HR and 92 strikeouts. Boggs will have 0 RBI and 20 "missed opportunities". Kingman will have 16 RBI and 36.8 MO. Kingman's percentage will be .303 even though he hit like a ridiculous idiot and Boggs' % will be .000, even though he hit like a champion.

6:37 PM Jun 2ndbbmarksI think that the definition of "not making an out" should include batters reaching base on error or fielders choice in which no actual out is made. My opinion is that ROE should be considered a positive play. It's bad enough that we don't give a batter an RBI if he smashes a hard ground ball that the 3rd baseman misplays and a run scores. We certainly shouldn't penalize a batter for NOT driving in a run on that type of event.

6:22 PM Jun 2ndmskarpelosI feel compelled to defend my honor. I would never say anything like "Screw the consumer ...". I had no idea that Bill was trying to design the stat for a general audience. I thought it was for a very specific audience--namely those of us who pay $36 a year to subscribe to BillJamesOnline for the privilege of interacting with Bill James on a regular basis. I believe this self-selection process results in a far more sophisticated audience--one that is perfectly capable of understanding Markov Chain analysis and one that is certainly more knowledgeable than "some afternoon radio guy in Mobile, Alabama".

Since Bill's intent is to popularize the stat beyond just BillJamesOnline to a general audience, I understand perfectly well the need for a simpler stat. I work in enterprise software development, so I deal with similar compromises every release. (Do we add functionality that only "power users" need or do we keep the interface simple for a less sophisticated but more numerous user base?)

So, for the record, if Bill wants to make 'RBI Opportunities' available to a general audience, I'm very much on board with a definition that doesn't depend on finite state Markov chains.

4:19 PM Jun 2ndTHBRHoly Hannah! I'm the guy who asked the original question, and I've been away from a computer basically since the first article came up (visiting 3 grandchildren in two different West Coast locations -- I'm a Jersey guy), and I am ASTOUNDED at the discussion that ensued -- and also gratified, because I would never have thought of all these ramifications myself. Thank all you commenters, and ESPECIALLY thank you Bill James for providing the site, the feedback, and the opportunity for discussion. This is a FABULOUS site -- best 10 cents a day I've spent since I was 8 years old (more than half a century ago!) and could get two 2-ounce candy bars for my 10 cents.

6:36 PM May 31sttangotigerRight, I wasn't taking a position. I simply saw the objective, and worked within those guidelines and constraints, and came up with the holes as best I could. As you said, just trying to be helpful, like a good little soldier.

Joe Posnanski (good friend of Bill James I think) does a simpler version of the Tom Ruane article I recently posted:

http://joeposnanski.com/JoeBlog/2008/01/16/rob/

As I said elsewhere (or here?), no one's going to know what the average RBI rate is in Bill's scenario, and I don't know that the public will be educated on it anyway. Joe's list (as is Tom Ruane's) I find more appealing, since it gives you number of RBIs relative to average. Everyone will understand "who drove in 20 more runners than an average hitter would have, given his opps". That's me though.

Bill seems set on proceeding along his lines, and other than the few wrinkles I noted that he should change, that's the best I think I could do with it.

4:28 PM May 31stwovenstrapActually, Tango: rereading -- I see you were basically just trying to be helpful. Second time I've had to trim a prickly response. But it remains true that our aims and assumptions are very very different.

3:22 PM May 31stwovenstrapTango: Since you are advocating something far more complicated and forbidding to the average baseball fan than Bill's system, it's difficult for me to understand why I should care how you calculate runners stranded. I'm proposing using just that, or a simple percentage derived from that. I get that you don't really care what stats are used in e.g. a radio broadcast, and I'm sure your case for Markov Chains is sound, in and of itself. I'm not trying to be unpleasant, but if you say "I'm actually not advocating anything for, or not for, the mainstream" then your contribution about runners stranded only has relevance to your conversation, which I'm not having, just as you are not having a conversation about a stat fit for radio talk show hosts. It seems to me that Bill is actually having my conversation, as he repeatedly invokes the man running an afternoon call-in show in Mobile, Alabama.

3:18 PM May 31stBucky08Why would you charge an opportunity on a failed sacrifice attempt when the theory behind taking the bat out of the hitter's hand still applies?

12:26 PM May 31stBucky08Why would you charge an opportunity on a failed sacrifice attempt when the theory behind taking the bat out of the hitter's hand still applies?

12:24 PM May 31sttangotigerMartin:

1. I'm actually not advocating anything for, or not for, the mainstream. I don't have a horse in that race.

2. My runners stranded is just that. Nothing about how to divide it with anything else (yet).

3. I'm working within Bill's confines, and my posts simply attempt to make the best of what he's trying to do. My best efforts, in terms of trying to get the scales right (30% success rate), shows that there's a limit here, because of the runner on 3B and less than 2 outs. While Bill's scaling framework is better than a simple RBI per opps, it has its limits.

4. The Markov approach, done by Tom Ruane with a minor tweak, is the one I'd advocate. I'd prefer to see something like "drove in 30% more runners than an average hitter". So, a scale where 100 is average, and work from there, would be ideal (for me anyway). To me, the only scale that can work, is 100 or .500. If you don't know what's average, then how do you know what's good?

5. As an aside, I hate RBI. RBI-HR (runners driven in) is what should be pumped. That's for another day (or if you email me, I can send you to my article on it... Google tangotiger and you'll find my email address.)

2:29 PM May 30thwovenstrapTango: I'm a little confused -- you are not advocating propagating a simple RBI/(RBI + Runners-Stranded) statistic on the broadcasting booths of America, yes? You seem to want a definition of Runners Stranded as a means to a more complicated end like calculating RBI Opps here. If I'm misunderstanding you, I apologize.

I'm saying the path to popular consumption is simply to propagate that simple counting/division statistic. Number of runners driven in divided by Available number of runners on base who could potentially be driven in. Or something very close to that. That's a stat that might change people's minds about ... a leadoff player's clutch ability or whatever.

Having been a wet blanket on the complexity of calculating RBI Opps, I must concede that if the resultant stat is simply an algebraic sum of all baserunners using a (1.0/0.7/0.4/0.1) scale, it might not be so very forbidding. I still think it's much closer to BP comprehension than afternoon talk-radio consumption.

11:51 AM May 30thtangotigerHere's my "left on base" or "runners stranded" definition:

Count the number of runners that (a) were removed from the bases without scoring, (b) didn't move a base, with an out on the play, and (c) still on base on the third out. That's what I would track.

6:33 AM May 30thtangotigerI didn't see your note about "percentage of runners stranded", but in the comments section of, I think, that article, I gave a very easy definition. That's the one I'm using.

6:31 AM May 30thwovenstrapI am not often accused of defeatism, but whatever. As it stands right now, I still think this is an essentially technical, analytical statistic that for some reason is being proposed for use by radio announcers in Mobile, Alabama. I think in that sense, the stat is a complete hybrid that cannot please the people who want to use it for Markov Chain analysis, or the people who are just listening to the ballgame on a hot summer night.

I think it was Rob Neyer, or perhaps Bill James, who observed that the most common question fielded by rulebook mavens is "Please define the save rule." This thing is much more complicated than the save rule, with no prospect of getting less complicated. What is .40 of an RBI? The average baseball fan will just go "tilt" at that. There's no way for the radio guy to explain that in thirty seconds.

Bill ignored the second part of my comment to the last article, which was a plea to simplify this to something more like "percenage of runners stranded," the obverse of the stat used for relievers who inherit runners. Is that currently counted? I have no idea -- but like Baserunner Kills or Blown Saves or even Houdinis, that would be a stat that could be understood by everybody in the audience in a nanosecond. This thing isn't that.

I admire the desire for a sensible stat -- but for what you're doing now, you have to forget the guy in Mobile, Alabama, at least for a little while. That doesn't mean bringing in Markov chains -- if you leave out the radio audience and are aiming for lay readers of BP (that's about me), the current level of complexity is just right.

2:27 AM May 30thtangotigerRight, it is a sort of "batting average for RBI". Basically, it seems that Bill is looking to get some context into the RBI, without biasing the stat based on the number of opps with runner from 1B and 3B. To the extent that he wants to reach that objective and get that kind of scale, then he's got most of it pretty well, except for the runner on 3B and less than 2 outs, where that will bump up everyone's average, some more than others. It's unfair, but not as unfair as simply counting all opps the same. So, I'd call this a pretty good effort in that regard.

7:27 PM May 29thrtayatayYou know, in retrospect, I think we're over-complicating the thing. What are we really trying to do with this stat? It seems to me more of a basic 'batting average for RBI's' that the guy on the street can understand. Statheads *know* that if Ichiro hit 3rd, he'd drive in 100, but people in general don't think that way. If you're reading this column, you probably know who the best hitters in baseball are, regardless of how many RBI's they had. If it's kept simple, it's a quick & easy stat that can be easily understood, has a chance to get in the mainstream discussion, and sheds some light on a misused/overused RBI number. Sometimes less is better.

3:29 AM May 29thtangotigerFinally, for runners on 1B, counting misses as the runner remains at 1B and an out is made, we set the misses to 0.30, and our success rate is 29%.

In order to get the success rate as 60%, the misses will count as .08.

***

To summarize: set runner on 3B, 2 outs as 1.00, runner on 2B at 0.70, runner on 1B as 0.30. This will give you a success rate of roughly 30%.

Unfortnately, the runner on 3B and less than 2 outs will bias all this, as its success rate is 60%. So, anyone lucky enough to get lots of these opps will get an unfair advantage.

But, our point here was to try to remove the bias to begin with.

8:57 PM May 28thtangotigerRepeating with runners at 2B, and setting the missed opp as 0.7 (either leaving him on 2B or 3B while incurring at least an out), the success rate is 27%.

If we only count the missed opp if the runner remains at 2B with the batter making an out (i.e., productive outs where the runner moves to 3B is discarded), then the success rate is 31%.

Going back to the runner on 3B with 2 outs: if we set the misses as 1.00, the success rate is 29%.

So, in terms of scaling, it makes more sense to count the runner on 3B with 2 outs as 1.00 for the misses, if we count the runner on 2B as 0.70 for the misses. This of course means we take a hit on the runner on 3B with less than 2 outs as biasing the metric here.

Alternatively, if we want the scale such that the success rate is 60%, then the misses for runner on 2B would count as 0.21.

8:49 PM May 28thtangotigerLet's look at the runner on 3B situation. Here's what I did. I looked at all games from 2000-2007, noting if there was a runner on 3B, and if he scored, stayed on 3B without the batter making an out, or if there was an out on the play. (I didn't check if the batter got an RBI on the play or not... just if the runner scored. Just a little sloppiness on my part to get this done.)

Presuming I did this right, I get, with 0 outs: 13585 runners from 3B scoring, 8380 times a batter made an out, and the runner did not score, and 2255 where there were no outs, but the runner stayed on 3B. We discard this last one, and we get a success rate of 62%.

Repeating with 1 out, and it's: 31662, 23579, 7962 respectively, for a success rate of 57%.

With 2 outs, we have: 20229, 50633, 11128. In order to get a 60% success rate, we need to weight the second number at 0.27. If we weight it at 0.70, we get a 36% success rate.

Alternatively, in order to get a 36% success rate for the 0 out and 1 out situations, we need to weight the missed opps as 2.5 misses each. Since this is impossible to explain, we reject this possibility.

So, if one concern we have is that we will bias the success rates based on the disproportionate number of opps with runners on 3B and less than 2 outs, then we cannot set the missed opps of runner on 3B with 2 outs as high as 0.70. It has to come down all the way to 0.27.

8:27 PM May 28thtangotiger"A batter who strikes out or pops out with a runner on third and no one out has missed a very easy RBI opportunity"

If the batter makes the first out, the next batter also has a fantastically great chance to drive in that runner. So, the handoff is not that bad. But, if the batter makes the second out, he's really taking the bat out of the next batter's hands, since an out cannot score that runner from 3B.

Just as Bill said that not making an out with the runner on base simply hands off the opportunity to the next batter (and therefore we don't want to charge him a missed opp), a similar situation arises here in that you've gotta penalize the guy for making the second out much more than the guy who makes the first or third out.

***

Good (great) point about making sure that the success rate is similar for each base situation, since we don't want to bias the stat based on someone having a disproportionate number of "easy" or "hard" opps. By forcing a scale such that all the base situations yields similar success rates, then it adds a good deal to the discussion.

That said, I don't know if the current setup does that, so I'll work on it tomorrow to see how well this holds up.

7:42 PM May 28th