By Bill James

May 23, 2008

*Trying to figure out how to count missed RBI*

OK, what exactly is an RBI Opportunity?

We have an “RBI Analysis” for each player in the statistics section, and a few weeks ago a reader asked, “Why don’t you count RBI Opportunities?” That’s not *exactly* what he asked; exactly what he asked was “I'm curious about one missing statistic: RBI percentage = RBI / RBI Opportunities. It seems to me that (barring IBB) every time a batter steps to the plate, he has a chance of getting at least one RBI. If the bases are loaded, he has a chance of getting as many as 4 RBI. Why didn't this simple stat appear at the same time as RBI? And why isn't it used now? Actually, the answer to the second one is probably that there are other better ways of measuring that -- but what are they?”

I told Mr. Anonymous, who posed the question, that that sounded like a good idea and we would try to include it in the stat section. We put it on the to-do list, and in the fullness of time our programmer got to it. But then we had to face the question, “What is an RBI Opportunity?”

The method implied by the questioner is that RBI Opportunities are

1) All runners on base, plus

2) All plate appearances except maybe Intentional Walks.

We had actually counted RBI opportunities, by this definition, in a book we did years ago, __The Baseball Scoreboard__. But this definition, when you think about it, has a couple of really serious problems. The smaller problem, which is still serious, is that it will produce “RBI percentages” which will be, for the most part, absurdly low. Alex Rodriguez led the majors in RBI in 2007, but by this method his RBI percentage would have been. ..well, I don’t exactly, but something less than .150. He had 697 Plate Appearances, not counting Intentional Walks, and he came to the plate 382 times with men on base. Even if it was only one runner on base each time, that’s at least 1079 “RBI Opportunities”, meaning that his RBI percentage would be less than .145. It doesn’t seem like a player who drives in 100 runs should have to go into arbitration and hear that he failed to drive in 91% of his RBI Opportunities.

I could live with that, I guess, but there is a more serious problem. In 1962 Tommy Davis, batting behind Maury Wills when Wills was stealing 104 bases, drove in 153 runs with only 27 doubles and 27 homers, although he did hit a lot of singles. In 1985 Terrible Tommy Herr, batting behind Vince Coleman when Coleman stole 110 bases, drove in 110 runs with only 8 home runs. Obviously these players benefited from having many runners in scoring position when they came to the plate.

But if you simply count every man on base as an RBI opportunity, then these players would have had more RBI opportunities if Wills or Coleman had never attempted to steal a base, since, by this logic, every caught stealing reduces the RBI opportunities for the next hitter, while a successful steal does nothing at all. That doesn’t make any sense.

I guess I could live with that, too, if I thought we could get by with it. We wouldn’t. The flaw in the statistic would become obvious, and we’d get hammered for propogating an obviously illogical system for counting RBI opportunities.

OK, so how *do* we count RBI opportunities? I’ve turned several other things over in my mind that don’t quite work, and I have one thing in mind that might work, but I’d like to know what you all think about it. My first idea was that a player might be credited with a “full” RBI opportunity if he batted with a runner on third base, two-thirds of an opportunity if he batted with a runner on second, and one-third if he batted with a runner on first. But this would mean that there would be players, measured in short cycles, who would have RBI production rates greater than one. Run this one through your head: *Conor Jackson was named the National League player of the week last week, when he drove in 14 runs with only 9 RBI opportunities. *Huh?

My next thought was that we might charge a player with 0.50 RBI opportunities for a runner on first, 1.00 for a runner on second, and 1.50 for a runner on third. But (a) this implicitly says that, even when a hitter drives in a runner from third base, he has only accomplished two-thirds of his job (1.00 RBI divided by 1.50 RBI opportunities), and (b) it also leaves open the possibility, although unlikely, that a player’s RBI could exceed his RBI opportunities.

OK, well how about this. This is my proposal. We count a player’s “RBI opportunities” as the sum of two things:

1) Actual RBI, and

2) Missed RBI opportunities.

Missed RBI opportunities are counted as

1.00 for a runner left on third base with less than two out,

0.70 for a runner left on second base, or on third base with two out,

0.40 for a runner left on first base.

However, *no missed opportunities are charged if the batter does not make an out.* Runner on first, batter singles, he hasn’t *missed *the opportunity; the oppotunity still exists. He hasn’t *missed* the opportunity; he’s improved it, and handed it on to the next hitter.

Do you think that would work? It would seem to me that it would. You would have a number, RBI percentage, that

a) would never exceed 1.000,

b) would never be less than zero,

c) would increase whenever a batter drove in a run,

d) would decrease whenever he made an out with a man on base without driving in a run, and

e) would be most heavily effected when he failed to come through with runners in scoring position.

It seems to me that those are the things we are trying to accomplish, and also that this method could not reasonably fail to correlate strongly with actual RBI, while at the same time delivering a much higher percentage for a player who drove in 100 runs while batting 125 times with runners in scoring position than for a player who drove in 100 runs while batting 200 times with runners in scoring position.

I’d like to hear your opinions (below), and also we’ll post a poll question on it, and gather some opinions that way. Thanks. Appreciate your interest.

©2017 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy

## COMMENTS (19 Comments, most recent shown first)

garywmaloneyMakes sense to me -- also parses between types of RBI opps. Can't wait to see how Joe Carter fares on THIS measure.

This should also be programmable from Retrosheet's game logs.

9:20 PM May 26thScottSegrinI think that simply measuring the percentage of RISP driven in would accomplish pretty much the same thing and would be *MUCH* easier for the average fan to get their arms around. While technically batting with no runners on or a runner on first does represent an RBI opportunity for the batter, we don't really think of it that way. We do with RISP - thus the name we give those runners.

8:13 AM May 26thtommeagherFor whatever reason, I felt like looking at the claim made by 800redsox9. This is rough, so no gospel. I compared his base-state performances (splits off B-R.com) to other great hitters of his generation with lots of R and/or RBI (Brett, Carew, R. Jackson, Molitor, Murray, Parker, Winfield, Yount). I just took Tango's numbers for likelihood of scoring from each base in the retrosheet era and used that to generate values for runs driven in above average, broken down between contributing to the baserunners scoring and the batter scoring.

Be forewarned: none of what follows is park or league adjusted, and since I was using splits rather than PBP I had to estimate the distribution of runners scoring for each base-state (and I mean a mathematical formula to estimate it, not a guess).

Schmidt scored 8.9% of runners on first, 16.4% of runners on second, and 32.3% of runners on third. The group of HOF hitters (plus Parker, who was outstanding at driving runners in) was 6.9%, 18.9%, and 38.9%. So Schmidt, as you would imagine, was much better at driving in a runner from first and not as good as driving in runners from second and third. Of course, Schmidt also drove himself in in 5.5% of PA, compared to 3.0% for the group.

Overall, I have the average Schmidt PA as being worth .0021 runs above average in terms of helping to score runners already on base, taking into account runners scored, runners advanced, outs generated, & GDP. The rest of the group is .0013 RAA per PA. In terms of contributing to his own likelihood of scoring, Schmidt is .0419 RAA per PA, compared to .0167 for the group. That totals out to 261 runs better than the group average over the course of his career, and wOBA has the advantage as 249 runs (.382 vs. 353 for the group).

What is most striking is whether first base was open or not for Schmidt. DI = runs driven towards home above average per PA, PO = runs put on above average per PA. wOBA excludes IBB.

Bases empty: 51.7% of PA, .383 wOBA, 0 DI, .0395 PO, .0004 IBB/PA

Runner on first only: 17.8%, .392 wOBA, .0256 DI, .0413 PO, .0006 IBB/PA

Runner(s) on, 1B open: 18%, .363 wOBA, -.0285 DI, .0461 PO, .1054 IBB/PA

Runners on, 1B not open: 12.5%, .386 wOBA, .0212 DI, .0466 PO, .0056 IBB/PA

Here's the rest of the group:

Bases empty: 54.1% of PA, .348 wOBA, 0 DI, .0120 PO, .00008 IBB/PA

Runner on first only: 18.6%, .370 wOBA, .0077 DI, .0198 PO, .0002 IBB/PA

Runner(s) on, 1B open: 15.5%, .354 wOBA, .0006 DI, .0266 PO, .0858 IBB/PA

Runners on, 1B not open: 11.9%, .354 wOBA, -.0022 DI, .0199 PO, .0043 IBB/PA

I estimated the RBI Opp. stat that BJ suggests in this article, just using .9 for all runners at 3rd base. This figures to be a bit off since it's not PBP LOB but an estimate based on outs. Schmidt is .417 RBI/Opp, and the group is .382. With a runner just on first, his edge is .357 to .272. With multiple baserunners and a runner on first, it's .389 to .369. With first base open and 1-2 runners on, he trails .351 to .376.

His overall RBI/Opp is better than anyone in the group:

.417 Schmidt

.400 Murray

.397 Brett

.395 Jackson

.391 Parker

.385 Winfield

.373 Molitor

.356 Yount

.344 Carew

With men on:

.370 Brett

.369 Schmidt

.367 Murray

.365 Parker

.350 Winfield

.349 Jackson

.345 Molitor

.334 Carew

.332 Yount

RISP:

.390 Brett

.386 Murray

.379 Parker

.374 Molitor

.372 Schmidt

.369 Carew

.367 Winfield

.355 Yount

.354 Jackson

RISP, 1B Open:

.403 Brett

.387 Molitor

.387 Carew

.382 Parker

.382 Murray

.365 Winfield

.354 Yount

.351 Schmidt

.349 Jackson

RISP, 1B not open:

.389 Murray

.389 Schmidt

.378 Brett

.377 Parker

.368 Winfield

.362 Molitor

.357 Jackson

.356 Carew

.356 Yount

6:47 PM May 25thnettles9For each of these methods proposed, what would be the result of a batter leaving two men on base (1st & 2nd, 1st & 3rd, 2nd & 3rd) and leaving the bases loaded? I had to have missed something because in reading all of these proposals, it made me wonder how it would compute if multiple runners were stranded by the batter. Just a thought.

9:30 AM May 25thgreggborgesonI see two missing elements.

1) Sac bunt does not count as an RBI opportunity. Batter was ordered NOT to drive in a run.

2) GIDP counts as two missed opportunities -- because batter is literally taking away two chances to drive in runs.

7:37 AM May 24thRichieSounds great. 1 vote for going for it.

12:12 AM May 24thwovenstrapI suspect that this would go over well with the statheads and almost nobody else. Remember how much people despised GWRBIs? Well, they'll view this the same way, similar anyway. If it isn't a simple counting stat and a simple percentage, it'll just lose 90% of the audience. I think the initial mistake here is counting the batter himself as an RBI opportunity. This is more like runners inherited -- how many runners inherited did the pitcher allow to score? This should be the inverse/obverse/something of that.

11:06 PM May 23rdrtayatayAlong the lines of the 'run expectancy' notes left by others, I think the number of outs at the time of the opportunity needs to be taken into account in some way. While we're on the topic... why do we award RBI's for groundouts but not for GIDP's?

9:40 PM May 23rdtangotigerSimilar in spirit to what Bill is doing here, someone several days ago asked about left on base. Below you will find his question and my answer

***

> I was wondering what the exact calculation for LOB is for hitters. If

> a hitter comes up with 1 man on, and is HBP, does he get stuck with 1

> LOB? and what happens if he gets on via error? The third case is

> what if there are 2 men on, and he hits into a double play? does he

> have 1 LOB or 2?

Since there's no official category, you get to do what you want.

Specifically here, what is it that you are trying to accomplish? It would seem that you want to count the number of runners that (a) were removed from the bases without scoring, (b) didn't move a base, with an out on the play, and (c) still on base on the third out. That's what I would track.

In your case, HBP and ROE don't get any counts, since you have no outs. The DP means that you have 2 runners counting (the guy who got out, and the guy who didn't move up the base or was left to end the inning).

Whether you call that a LOB or a Quatlu is your preference.

***

So, it shares some of the characteristics of Bill's RBI opps, in that we don't want to penalize a hitter for doing something good if a run didn't directly result.

8:54 PM May 23rdtangotigerSchmidt hit 291 solo HR, or 53% of his total:

http://www.baseball-reference.com/pi/event_hr.cgi?n1=schmimi01&type=b

Reggie hit 308, or 55% of his total:

http://www.baseball-reference.com/pi/event_hr.cgi?n1=jacksre01&type=b

8:29 PM May 23rd800redsox9Bill -

I am completely interested. I was born in Malden, MA (in 59) and moved to South Jersey in the 67. RedSox fan (70%)/Phillies Fan (30%) Mike Schmidt always seemed to me to have hit 400 solo homers. I need to do some research, but I bet he has the fewest RBI of the 500 club. This metric piqued my interest immediately.

I don't think Mike had "nobody on" (Rose, Bowa, Trillo, Morgan, etc.) I would be willing to bet he scores poorly in RBI opp.

8:20 PM May 23rdibroseyWhen you decided the opportunity gets passed along if the batter doesn't make an out, the whole concept seemed to fall together. I like the theory and look forward to seeing some data.

7:36 PM May 23rdTrailbzrI'd start by asking "What is the purpose of this stat?" It sounds like something along the lines of "to adjust distortions in the raw count of RBIs for discrepancies in opportunity."

If that's the case, then I don't think forcing a .000-1.000 scale is what you're after. If you want to know what does 150 RBI mean in the context of opportunities, you would normalize to 1.000=average performance and set up a ratio from there.

How to measure opportunity? There seem to be four strata:

a good out drives in from third without 2 outs

a hit drives in from scoring position

an extra-base hit drives in from first usually

a home run drives in the batter

To make such a measure meaningful, you'd need a standard for each of those things, which starts to get into Fun With Numbers.

5:08 PM May 23rdtangotigerAnd if you want to make it really quick, you could do:

1.00 RISP

0.50 runner on 1B

The run potential of the runner in scoring position is dropped by around 25%, compared to the runner on 1B of 13% or so. Gotta admit, there's something quite attractive about such a simple rule.

Do we really need the added wrinkle of the runner on 3B and 1 out as the sole exception to the above? That would comprise some 4% or so of all outs made. I doubt it would make any difference to any ranking.

4:21 PM May 23rdGOODFRIEND1) The idea of NOT charging a missed opportunity if no out is made is a revolutionary concept that is sort of the reverse of charging errors (which doesn't make a lot of sense). What a great idea. This concept can fins its way into many baseball metrics.

2) Arbitrarily assigning a value to the runners left on base seems illogical given that we don't really have a clue what those values should be. There must be a better way to just count (or not count) whole things. I'll come back to you if (I stumble over it).

4:20 PM May 23rdboutilijIf RBI Opportunities are going to be measured, won't we need to know the likelihood of a batter driving in a runner from third, second, or first base? (Figuring out the batter's likelihood of driving himself in via a home run would be more straightforward, by comparing a player's HR rate to MLB averages.)If these figures exist, then we'd have a better idea of knowing whether the proposed values are appropriate. Perhaps a plus/minus system that compared a batter's actual RBIs versus expected RBIs, based on the MLB averages, would work better. It might also be easier to adjust for variability in the hitting environment, since the method proposed in the article implies that a hitter has the same opportunity to drive in a runner at Coors Field as he would in Petco Park.

3:28 PM May 23rdtangotigerOk,thought about. Pretty decent. When you have a runner on 1B and 0 outs, and the batter makes an out, his chances of scoring goes from about 40% to 27%. With 1 out, his chances of eventually scoring goes from 27% to 13%. And, with 2 outs, it goes from 13% to 0%. So, we can see here that when the batter makes an out, the run potential of the runner on 1B is about minus 13%. Repeating for the runner on 2B, that's minus 22%. For the runner on 3B, it's roughly minus 30%. If you multiply all those numbers by 3.33, you get: 1B: 0.43, 2B: 0.73, 3B: 1.00, numbers that are remarkably close to Bill's numbers.

Now, I will quibble about the runner on 3B and less than 2 outs. The run potential of the runner with 0 outs and 1 out is a gap of around 20%. That is, he's got an 87% chance of scoring with 0 outs and on 3B, and 66% with 1 out. The gap between 1 out and 2 outs is 40%. And between 2 outs and 3 outs is 26%.

So, going back to the whole process, and setting the "1.00" to failing to drive in the runner from 3B with exactly 1 out, we simply multiply all our numbers by 2.5. We get:

1.00 runner on 3B, 1 out

0.60 runner on 3B, 0 or 2 outs, runner on 2B any outs

0.30 runner on 1B

I will propose as these numbers and this scheme, roughly, as the best numbers to use.

3:22 PM May 23rdtangotigerI've always based it on the 24 base/out state expectancies, just about exactly as Tom Ruane has done here:

http://www.retrosheet.org/Research/RuaneT/rbipro_art.htm

(I always remove HR from the equation, and focus only on RDI, or runners driven in, or RBI-HR. That's a topic for another day.)

Anyway, Bill's approach here seems interesting, but I'll have to study it a bit more.

1:37 PM May 23rdmskarpelosI like the general approach, but not the precise implementation. I suggest using adjustments based on the famous 24-node finite state Markov Chain instead of the proposed simple adjustments of 1.0, 0.7 and 0.4. That is, the adjustments should be proportional to the expected number of runs for the Markov Chain state in question. Also, since a batter always has an opportunity to drive himself in with a home run, he should register at least some missed RBI opportunity whenever he makes an out even if nobody is on base. Using the Markov Chain approach covers this situation as well.

1:07 PM May 23rd