Speed and Runs Scored
In response to a previous article, John Rickert suggested that I try. . well, here; I’ll quote his suggestion:
Would it be reasonable to use this data to match fast and slow players with the same onbase percentage and see if there is a noticeable speed advantage? And if so, to look at it for several onbase percentages such as .300,.325,.350,.375,.400 to see if such an advantage would increase or decrease as the players get on base more often? Or if it disappears at some levels and is present at others? That does run the risk of splitting the cohorts too finely.
I have a study which, while it is not exactly the study suggested by Professor Rickert, was prompted by it. The study shows clearly and definitively that the effect of speed on runs scored is not only "noticeable"; it is actually stunningly large—stunning to me. The study shows that very fast players scored 16% more runs than very slow players who had absolutely identical performance in every statistical category, including stolen bases, and from the same position in the batting order. Some of this difference no doubt was created by the fact that slower players were pinch run for more often than faster players. I have no way of adjusting for that.
OK, this is the study. This study is based on the Game Logs that I maintain, mostly to entertain myself; I have added three players in the last few days, so it is now up to 277,251 game lines. First I recreated the "speed system" that I invented for the previous article, modifying it just a little bit. (I had foolishly discarded the speed system after using it before, thinking of it as a onetime use.) Having created speed scores for each player in the 101game period surrounding the focus game, I then sorted players into five "Speed Groups", with the fastest 55,000 players being coded "5", the next group coded "4", and the slowest group of players coded "1".
Then, for each game line I created a thirteenelement code, indicating:
1) How many at bats the player had,
2) How many hits he had,
3) How many doubles he hit,
4) How many triples,
5) How many home runs,
6) How many walks (or hit batsmen; walks and hit batsmen were treated as the same thing),
7) How many strikeouts,
8) How many Sacrifice Hits,
9) How many Sacrifice Flies,
10) How many Grounded into Double Plays,
11) How many Stolen Bases,
12) How many Caught Stealing, and
13) What position in the batting order the player occupied (leadoff, cleanup hitter, etc.)
In other words, each group of players was compared not to a group of hitters who had SIMILAR performance, but to a group of hitters who had IDENTICAL performance, identical in every performance category except runs scored and RBI. Thus, any difference in how many runs they scored could not be attributed to. . ..well, stealing bases, or hitting triples. Those things were held constant.
Having created a code to identify every game line, I then sorted the data so that each game was put into a group of games with identical codes. The most common code was 10000000000009—that is, players who had 1 at bat in the game, batting from the 9^{th} position, and made an out which was not a strikeout and not a GIDP. There were 5,073 of those in the data, of which 549 were by "5" runners, 757 were by "4" runners, 859 were "3" runners, 983 were "2" runners, and 1,925 were "1" runners, very slow runners.
The second mostcommon code was 41000000000004—that is, cleanup hitters who were 1 for 4 in the game, the hit being a single. There were 2,728 of those games. The third mostcommon code was 41000000000003—that is, the same thing, except with batters who were hitting third, rather than fourth. There were 2,564 of those.
Having done this, I eliminated all of the codes of which there were less than 100 examples. This kept in the study the 507 most common codes and eliminated, I would guess, tens of thousands of "uncommon" codes, but the 507 common codes accounted for 180,536 games—almost twothirds of the games in the data.
At this point this doesn’t yield usable data, but we are getting there. Let us take the 2,564 thirdplace hitters who hit a single in four at bats, no walks or hit by pitch, etc. Of those 2,564 players, 481 were "5" runners—very fast—671 were "4" runners, 593 were "3" runners, 409 were "2" runners, and 410 were sluggards. Does that add up to 2,564? If it doesn’t, I don’t want to hear about it.
Anyway, the 481 "5" runners who did this scored 145 runs; in other words, just a hair more than 30% of them scored a run in the game. (Actually, probably a few of them scored two runs; I don’t know. I just totaled up the runs scored.) Of the 671 "4" runners, 194 scored a run, or 28.9%. Of the 593 "3" runners, only 139 scored a run, a sharp drop to 23.4%. But then, of the 409 "2" runners, 109 scored runs (26.7%), and of the 410 "1" runners, 27.6% scored runs.
The runs scored decrease as the players get slower, but irregularly. Of course, we can’t draw any firm conclusion just from this comparison, but each one of the 507 common game lines yields a data set like this. All we have to do now is combine them into one package. So how do we do that?
Here’s how I did that. . . well, I’d better explain. The packages are all or mostly all "speed loaded" in one direction of the other. For example, the code 30000010001006—that is, sixthplace hitters who went 0for3 with a strikeout and a groundedintodouble play—that code was represented by 4 games by "5" runners, and 46 games by "1" runners. Fast runners (a) usually don’t hit sixth, and (b) don’t ground into nearly as many double plays as the slow runners. On the other hand, the code 40000000000001—that is, leadoff men who went 0for4—is represented by 924 "five" runners, but only 39"one" runners, since leadoff men are usually fast.
Because of this "speed loading", we can’t just add up the totals for all of the codes, or anything simple like that. This is what I did. I multiplied the runs scored/game from each speedandcode sample by the smallest number of games in that code sample. In other words, since there are at least 409 games in each of the five "speed codes" that make up the Game Code 41000000000003, then the data for each sample can be safely stated as if it had occurred in 409 games—since each one of them was AT LEAST 409 games. Since there were 481 games with that code by "five" runners and they scored 145 runs, we record that as 409/481 * 145, which is 123.2952. We enter that as 123.2952 "run credits" in 409 games.
Well, all we have to do now is figure ALL of the run credits and all of the games for every cohort. We have plenty of data. We have 16,375 "Games" for each speed group. This is the data:
Speed

Games

Runs

5

16375

5771.108

4

16375

5623.86

3

16375

5392.048

2

16375

5185.623

1

16375

4958.463

Let me get rid of the decimals; those are just annoying:
Speed

Games

Runs

5

16375

5771

4

16375

5624

3

16375

5392

2

16375

5186

1

16375

4958

16,375 games represent a little over 100 seasons worth of fulltime data. 100 seasons, 162 games a season, that’s 16,200 games for each group. Runs scored increase with each step up in speed. The fastest runners scored 813 more runs than the slowest runners, or 8 more runs per season.
You may note that the numbers are low. 4,958 runs in 100 seasons is only 50 runs per player per season, a very low total. I know why it is so low.
It’s because games with triples and home runs rarely get included in the study, because those games are less common. I actually don’t think there is ANY game code with a triple included in the study—lots and lots of doubles, walks, singles, some stolen bases, some games with two hits and three hits, a few home runs, but no games with four hits, and no triples. The more productive the game was for the hitter, the less likely it was to be a common line—and remember, we have to have 100 FROM ONE POSITION IN THE BATTING ORDER. This reduces the number of runs scored in each test sample.
Reducing the base—reducing the runs scored—makes the measured percentage larger than it actually is. If you had an 8run difference with a 80run base, that would be a 10% increase; it only measures at 16% because the base is artificially slow. Also, the total base of 100 seasons is actually somewhat larger than 100 seasons, because we have excluded onethird of the games. It’s not actually 8 runs per season top to bottom; it’s probably more like 7 runs, assuming that there would be some additional "run benefit" in the unused part of the data.
So there you have it. Very fast runners score about 7 to 8 more runs, per 162 games, than very slow runners with identical production from the same batting order position. Some of this separation may be explained by the tendency of teams to pinch run for slower players.
Thank you for reading, and thanks to John Rickert for the suggestion.