Speed and Runs Scored

By Bill James

May 24, 2020

Speed and Runs Scored

In response to a previous article, John Rickert suggested that I try. . well, here; I’ll quote his suggestion:

Would it be reasonable to use this data to match fast and slow players with the same on-base percentage and see if there is a noticeable speed advantage? And if so, to look at it for several on-base percentages such as .300,.325,.350,.375,.400 to see if such an advantage would increase or decrease as the players get on base more often? Or if it disappears at some levels and is present at others? That does run the risk of splitting the cohorts too finely.

I have a study which, while it is not exactly the study suggested by Professor Rickert, was prompted by it. The study shows clearly and definitively that the effect of speed on runs scored is not only "noticeable"; it is actually stunningly large—stunning to me. The study shows that very fast players scored 16% more runs than very slow players who had absolutely identical performance in every statistical category, including stolen bases, and from the same position in the batting order. Some of this difference no doubt was created by the fact that slower players were pinch run for more often than faster players. I have no way of adjusting for that.

OK, this is the study. This study is based on the Game Logs that I maintain, mostly to entertain myself; I have added three players in the last few days, so it is now up to 277,251 game lines. First I re-created the "speed system" that I invented for the previous article, modifying it just a little bit. (I had foolishly discarded the speed system after using it before, thinking of it as a one-time use.) Having created speed scores for each player in the 101-game period surrounding the focus game, I then sorted players into five "Speed Groups", with the fastest 55,000 players being coded "5", the next group coded "4", and the slowest group of players coded "1".

Then, for each game line I created a thirteen-element code, indicating:

1) How many at bats the player had,

2) How many hits he had,

3) How many doubles he hit,

4) How many triples,

5) How many home runs,

6) How many walks (or hit batsmen; walks and hit batsmen were treated as the same thing),

7) How many strikeouts,

8) How many Sacrifice Hits,

9) How many Sacrifice Flies,

10) How many Grounded into Double Plays,

11) How many Stolen Bases,

12) How many Caught Stealing, and

13) What position in the batting order the player occupied (leadoff, cleanup hitter, etc.)

In other words, each group of players was compared not to a group of hitters who had SIMILAR performance, but to a group of hitters who had IDENTICAL performance, identical in every performance category except runs scored and RBI. Thus, any difference in how many runs they scored could not be attributed to. . ..well, stealing bases, or hitting triples. Those things were held constant.

Having created a code to identify every game line, I then sorted the data so that each game was put into a group of games with identical codes. The most common code was 10000000000009—that is, players who had 1 at bat in the game, batting from the 9^th position, and made an out which was not a strikeout and not a GIDP. There were 5,073 of those in the data, of which 549 were by "5" runners, 757 were by "4" runners, 859 were "3" runners, 983 were "2" runners, and 1,925 were "1" runners, very slow runners.

The second most-common code was 41000000000004—that is, cleanup hitters who were 1 for 4 in the game, the hit being a single. There were 2,728 of those games. The third most-common code was 41000000000003—that is, the same thing, except with batters who were hitting third, rather than fourth. There were 2,564 of those.

Having done this, I eliminated all of the codes of which there were less than 100 examples. This kept in the study the 507 most common codes and eliminated, I would guess, tens of thousands of "uncommon" codes, but the 507 common codes accounted for 180,536 games—almost two-thirds of the games in the data.

At this point this doesn’t yield usable data, but we are getting there. Let us take the 2,564 third-place hitters who hit a single in four at bats, no walks or hit by pitch, etc. Of those 2,564 players, 481 were "5" runners—very fast—671 were "4" runners, 593 were "3" runners, 409 were "2" runners, and 410 were sluggards. Does that add up to 2,564? If it doesn’t, I don’t want to hear about it.

Anyway, the 481 "5" runners who did this scored 145 runs; in other words, just a hair more than 30% of them scored a run in the game. (Actually, probably a few of them scored two runs; I don’t know. I just totaled up the runs scored.) Of the 671 "4" runners, 194 scored a run, or 28.9%. Of the 593 "3" runners, only 139 scored a run, a sharp drop to 23.4%. But then, of the 409 "2" runners, 109 scored runs (26.7%), and of the 410 "1" runners, 27.6% scored runs.

The runs scored decrease as the players get slower, but irregularly. Of course, we can’t draw any firm conclusion just from this comparison, but each one of the 507 common game lines yields a data set like this. All we have to do now is combine them into one package. So how do we do that?

Here’s how I did that. . . well, I’d better explain. The packages are all or mostly all "speed loaded" in one direction of the other. For example, the code 30000010001006—that is, sixth-place hitters who went 0-for-3 with a strikeout and a grounded-into-double play—that code was represented by 4 games by "5" runners, and 46 games by "1" runners. Fast runners (a) usually don’t hit sixth, and (b) don’t ground into nearly as many double plays as the slow runners. On the other hand, the code 40000000000001—that is, leadoff men who went 0-for-4—is represented by 924 "five" runners, but only 39"one" runners, since leadoff men are usually fast.

Because of this "speed loading", we can’t just add up the totals for all of the codes, or anything simple like that. This is what I did. I multiplied the runs scored/game from each speed-and-code sample by the smallest number of games in that code sample. In other words, since there are at least 409 games in each of the five "speed codes" that make up the Game Code 41000000000003, then the data for each sample can be safely stated as if it had occurred in 409 games—since each one of them was AT LEAST 409 games. Since there were 481 games with that code by "five" runners and they scored 145 runs, we record that as 409/481 * 145, which is 123.2952. We enter that as 123.2952 "run credits" in 409 games.

Well, all we have to do now is figure ALL of the run credits and all of the games for every cohort. We have plenty of data. We have 16,375 "Games" for each speed group. This is the data:

Speed	Games	Runs
5	16375	5771.108
4	16375	5623.86
3	16375	5392.048
2	16375	5185.623
1	16375	4958.463

Let me get rid of the decimals; those are just annoying:

Speed	Games	Runs
5	16375	5771
4	16375	5624
3	16375	5392
2	16375	5186
1	16375	4958

16,375 games represent a little over 100 seasons worth of full-time data. 100 seasons, 162 games a season, that’s 16,200 games for each group. Runs scored increase with each step up in speed. The fastest runners scored 813 more runs than the slowest runners, or 8 more runs per season.

You may note that the numbers are low. 4,958 runs in 100 seasons is only 50 runs per player per season, a very low total. I know why it is so low.

It’s because games with triples and home runs rarely get included in the study, because those games are less common. I actually don’t think there is ANY game code with a triple included in the study—lots and lots of doubles, walks, singles, some stolen bases, some games with two hits and three hits, a few home runs, but no games with four hits, and no triples. The more productive the game was for the hitter, the less likely it was to be a common line—and remember, we have to have 100 FROM ONE POSITION IN THE BATTING ORDER. This reduces the number of runs scored in each test sample.

Reducing the base—reducing the runs scored—makes the measured percentage larger than it actually is. If you had an 8-run difference with a 80-run base, that would be a 10% increase; it only measures at 16% because the base is artificially slow. Also, the total base of 100 seasons is actually somewhat larger than 100 seasons, because we have excluded one-third of the games. It’s not actually 8 runs per season top to bottom; it’s probably more like 7 runs, assuming that there would be some additional "run benefit" in the unused part of the data.

So there you have it. Very fast runners score about 7 to 8 more runs, per 162 games, than very slow runners with identical production from the same batting order position. Some of this separation may be explained by the tendency of teams to pinch run for slower players.

Thank you for reading, and thanks to John Rickert for the suggestion.

COMMENTS (20 Comments, most recent shown first)

KaiserD2
I finally got around to reading these and i am fascinated. I have several comments/questions.

1. The results are so consistent that it seems they must be meaningful. I am slightly surprised by this for one reason. Players only partially control the number of runs they score. That is partially a function of their own walks, hits, etc., but it's also a function of their era, their ballpark, and their teammates, all of which will independently raise or lower the number of runs they score. I have never investigated this systematically but my impression is that this is less of a problem for runs scored than for RBI, but there still must be some effect. I don't see any way that those effects are built into this study, yet the study produced very significant results.

2. Regarding the first part of the story that simply compared the runs scored of the faster and slower (but more selective) leadoff hitters, I was wondering, how does the difference in their average results compare to what you would see if you just computed Bill's Runs Created, or used linear weights, or any comparable measure? In other words, did this study simply confirm something we already knew, or does it change how we rate speed as opposed to OBP?

3. The same question applies to the greater number of runs scored by the faster players. Is it what we would have expected from their overall data?

David Kaiser
8:41 AM May 29th

bjames
bhalbleib
Bill, not to start another rabbit hole, but do you think the slower players getting on more by error than faster players is a result of something the players did (i.e. Harmon Killebrew hits the ball harder more often than Rod Carew, therefore, the defender is more likely to make an error on his balls hit) OR is it that the subjective nature of errors means that the defender is more likely to get a pass by the official scorer when Willie Wilson hits a ball that he kicks than when Willie Mays Aikens hits a ball that he kicks (i.e. Wilson's ball is scored a hit and Aikens' ball is scored an error)??

I don't care what it is. I could care less. I just want people to stop believing that this is a real thing, that fast players are going to score more runs because they reach base more often on errors.
2:15 PM May 26th

bhalbleib
Bill, not to start another rabbit hole, but do you think the slower players getting on more by error than faster players is a result of something the players did (i.e. Harmon Killebrew hits the ball harder more often than Rod Carew, therefore, the defender is more likely to make an error on his balls hit) OR is it that the subjective nature of errors means that the defender is more likely to get a pass by the official scorer when Willie Wilson hits a ball that he kicks than when Willie Mays Aikens hits a ball that he kicks (i.e. Wilson's ball is scored a hit and Aikens' ball is scored an error)?
2:13 PM May 26th

bjames
Who would you guess reached base more often on an error: Rod Carew, or Harmon Killebrew?

It's close, but it's Killebrew. 1.89% of at bats for Killebrew, 1.87% for Carew.
10:47 AM May 26th

MarisFan61
bjames: (re what was the problem with that post of mine)
"Length, Clarity, and Relevance"

Granting that it's subjective:
You're wrong. :-)

I was answering what had been asked.
It was a serious, and GOOD (very good) reply and comment on what had been asked, and had taken a lot of work -- finding it, re-reading it, and relevantly excerpting it.

If there's something about that article that makes you not want it quoted or discussed, you could say -- but just deleting the comment because of "Length, Clarity, and Relevance" -- nah.
2:58 AM May 26th

bjames
Here's one: who would you guess reached base more frequently on an error: Tim Raines, or Tim Wallach?

Yep.

It's Wallach.
2:34 AM May 26th

bjames
Who would you guess reached base on error more frequently: Lou Brock, or Joe Torre?

It's Torre.
2:26 AM May 26th

bjames
MarisFan61
Bill: What was the problem?

Length, Clarity, and Relevance.
2:20 AM May 26th

bjames
Who would you guess reached base on an error more often: Frank Howard, or Willie Davis?

It's Howard, by far. Howard reached base about 40% more often, per at bat, than Davis.

Who would you guess reached base on an error more often: Gil Hodges, or Junior Gilliam?

It's Hodges, by about 10%.
2:19 AM May 26th

MarisFan61
Bill: What was the problem?
1:18 AM May 26th

bjames
I removed them. Oscar the Grouch may have inhabited by spirit for a time here. Had a tough day.
12:27 AM May 26th

MarisFan61
(Were my last two Comments removed on purpose, or is it just a glitch?
They were about Bill's article that was asked about, which was in the 1992 Baseball Book.
Was there perhaps something objectionable about my quoting from it?)
11:56 PM May 25th

bjames
In regard to speed and reaching on errors: who do you think reached on an error more frequently in his career: Richie Ashburn, or his longtime teammate Del Ennis?

It was Ennis. I knew it would be before I checked. A slow, hard-hitting right-handed hitter will reach on an error more often than a fast left-handed hitter.
11:51 PM May 25th

jrickert
Thanks! It's nice to see study of a question give a relatively clean answer. I'd be curious to see what the RBI rates for each group but I suspect that any effect would be smaller than the noise in this data. My initial guess was no speed effect but then I thought about a comment from the 1986 Abstract about a double by Rickey Henderson being slightly less valuable before the fact than a double by Ron Hassey because Henderson has leg doubles that would be singles for Hassey.
6:16 PM May 25th

shthar
Be interesting to see age worked into this.
3:43 PM May 25th

OwenH
Question for Maris and davidt50: do you remember where that leadoff study can be found? Would love to read it. Gracias!
12:38 PM May 25th

kgh
Regarding the possible pinch running effect: most pinch running happens later in the game. If you only used events in innings . . . say 1-6, would you see anything different than what you found above?
11:19 AM May 25th

3for3
Is the effect linear? Can we assume a 'very' player will be 4 runs over (fast) or under (slow)
9:50 AM May 25th

davidt50
This is enjoyable revealing reading. I like the way it all comes together. I know once there was a study comparing Rickey Henderson to Willie Mays as leadoff hitters.
The same team with Rickey leading off scored more runs as a team, than the same team did with Willie Mays leading off. Even though, as I recall, Willie himself scored more runs than Rickey, Rickeys team scored more.
1:37 AM May 25th

MarisFan61
Interesting and clear results.
I think it's fair to say that some of this is tough reading, but the gist and direction and results are very clear.

I wondered if the 'weightings' on the "speed-and-code samples" -- i.e. "multiplying the runs scored/game from each speed-and-code sample by the smallest number of games in that code sample" -- might have distorted the actuality, by giving equal weights to all the samples when maybe they don't deserve equal weights.
But, I'm thinking that actually it was designed to prevent wrong weighting.. Anyway I think it's hard to grasp exactly what's the story about that, and would appreciate if someone could try to elucidate it.

I appreciate that you did explain so much about details that make the calculated numbers be somewhat off from what they 'really' are.

BTW, about pinch-running being a factor that somewhat penalizes the slower players in this calculation, i.e. not being able to account for runs taken away from them due to their being pinch-run for:
I would offer the thought that the fact of their needing a pinch runner is a cost to the team, in that it takes away some flexibility for the rest of the game, and so, for what it's worth, that this somewhat softens any significance of that 'artificial' loss.
i.e. We might say that by rights they should get credit for more runs than shown by these calculations, but on the other hand there is an inherent cost to the team in the process, which somewhat balances it out.

BTW #2: You said not to tell you if that thing doesn't add up to 2564.
It does, so I'm telling you. :-)
11:42 PM May 24th

Speed and Runs Scored

COMMENTS (20 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: