Speed and On Base Percentage
I had a question in the "Hey, Bill" section on Thursday, May 21st, which was "Suppose that you have two players who are otherwise equal, except that one. . . well, here, let me quote the question as it was asked.
Would you rather have a fast leadoff man with a .350 OBP, or a slow leadoff man with a .375 OBP? All other things being equal.
Asked by: OwenH
Tom Tango weighed in on this question, and then I offered the thought that I’d probably rather have the fast guy because he would probably have more defensive value and thus would probably have the longer career. Then I thought of a study related to that issue, so I wasted the rest of the day doing that study, rather than doing the work I had promised someone else that I would do.
My study actually doesn’t focus on LEADOFF MEN; it focused simply on PLAYERS. I changed the question to "Would you rather have a fast player with a .350 OBA, or a slower player with a .375 OBP, all other things being equal?"
If you have been reading this stuff for a while, you know how much I love to do matched-set studies, and I thought of a matched-set study to look at this issue. I started with all players who played 100 games in a season from the years 1900 to 2010; I think there were 18,454 such players.
For each of those players I created an seven-element code, based on:
1) His playing time, Games and Plate Appearances,
2) His batting average (20-point buckets, .300 to .319999, etc.),
3) His on base percentage (25-point buckets, .350 to .3749999, etc.),
4) His slugging percentage (40-point buckets),
5) His age,
6) His defensive position, and
7) His speed.
I could spend 20 pages explaining EXACTLY how all of this was done, but you know. . .how would reading that make your life better?
In the study there were several million potential codes, whereas there were only 18,000 and some players, so most players have unique codes; that is, most players have codes that do not exactly match any other players. 12,647 players had unique codes, but that still leaves 5,807 players who had codes identical to some other player. The largest "code group" is the code 3 13 13 09 08 1 3, which is center fielders who were 26 or 27 years old, ran well and who hit .260 to .27999 with an on base percentage of .325 to .349999, a slugging percentage of .360 to .39999, and playing time equivalent to about 115 to 120 games. The eight players who fit into this box were Bernie Neis (1923), Taylor Douthit (1927), Jimmy Welsh (1930), Chuck Diering (1949), Jim Delsing (1952), Mike Devereaux (1989), Jermaine Allensworth (1998) and Mark Kotsay (2003).
What I was interested in, though, was not players who had identical codes, but matched sets of players who had identical codes except that one had a higher on base percentage, and the other one was faster. To identify those, I created a "second code" for each player, which was the same as the other code, except the player in his second code was one group lower in his on base percentage, but one group higher in his speed.
This process yielded a list of 1,987 Matched Sets of players—many more than I need to do my stupid little study. Some of those matches, however, are not really what we are looking for. For example, Rich Rollins in 1962 is a match with Robin Ventura in 1992.
First
|
Last
|
YEAR
|
AB
|
R
|
H
|
HR
|
RBI
|
BB
|
SB
|
Avg
|
SPct
|
OPS
|
Rich
|
Rollins
|
1962
|
624
|
96
|
186
|
16
|
96
|
75
|
3
|
.298
|
.428
|
.802
|
Robin
|
Ventura
|
1992
|
592
|
85
|
167
|
16
|
93
|
93
|
2
|
.282
|
.431
|
.806
|
Rich Rollins and Robin Ventura were both 24-year-old third basemen, but what the chart above doesn’t tell you is that Ventura’s on base percentage was .375, and Rollins was .374. Sorting them by 25-point buckets, that put Ventura into a higher OBP group than Rollins, but it’s not really what we’re looking for here.
So I put in a rule that, in order to qualify as a matched set, one of the players had to have an on base percentage at least 10 points higher than the other player. The grouping system created a separation potential of 1 point to 49 points of on base percentage; I changed to 10 points to 49 points—thus, an average of about 25 points. That process eliminated 244 of my matches.
A similar thing with speed. I have "speed scores" for every player every year (1 to 10) that are a part of my regular data base, and these are pretty reliable, but the difference between a "7" and a "6" in "Speed" is not necessarily all that reliable, like the difference between a .375 on base percentage and a .374 on base percentage. I added a rule that the difference in Speed has to be at last 2 categories. This eliminated another 214 of my matches, leaving me with 1,519 matched sets, which is still a lot more than I need.
Not all of those, however, are good matches. Some of them frankly are terrible matches. To site two examples of comically bad matches. . . Jim Rice, 1987, winds up matched with Peanuts Lowrey, 1953, and Al Kaline, 1970, winds up matched with John Vander Wal, 2001.
Al Kaline in 1970 and John Vander Wal in 2001 were both 35-year-old right fielders. Kaline hit .278 with 16 homers, 71 RBI; Vander Wal hit .270 with 14 homers, 70 RBI, about the same doubles, triples and runs scored as Kaline—nonetheless, John Vander Wal is not a good match with Al Kaline, because Al Kaline is Al Kaline and John Vander Wal is John Vander Wal.
I created a 22-point Similarity Score to evaluate each of the Matched Sets. Kaline and Vander Wal were very close in the one-season categories that formed the match, but they were miles apart in the career-to-date categories of the Similarity system—Career Games, Career Hits, Career Homers, Career Total Bases and other things. If you remember Peanuts Lowrey—I am not old enough to remember Peanuts Lowrey, but I inherited his baseball card from a brother-in-law—you know how absurd the notion is of Jim Rice being similar to Peanuts Lowrey or, for that matter, anybody nicknamed "Peanuts". The Lowrey/Rice match is, in a sense, even more absurd, because not only is there are massive difference in their careers up to the point of the comparison, but the seasons are not truly similar, either. Rice hit .277, Lowrey .269, and Lowrey actually had a higher slugging percentage than Rice did in that one year, .423 to .408. But Rice had 404 at bats, 13 homers, 62 RBI in the focus season; Lowrey had 182 at bats, 5 homers, 27 RBI. The bad match happened because the players who played in 100 or more games were sorted into 4 buckets, and those players who were near the 100-game limit had an open end on the bottom; Rice was near the top of that playing-time group, and Lowrey was near the bottom.
The Similarity Score system had 17 "negative" categories and 5 "positive" categories. The five "reverse" categories were triples, stolen bases, speed, walks, and on base percentage. Whereas a difference between the two players in any other category was considered to be a negative, a difference in any of those five categories was considered to be a positive, since we were looking for players who were similar in all other respects, but different in speed and on base percentage.
In the Similarity System, points were deducted from the match if a right-handed batter was compared to a left-hander, and points were deducted for each year of difference in which the seasons occurred. In general, I didn’t want to match a season from 1930 with a season from 1980. I preferred to match a season from 1930 with a season from 1925, and a season from 1980 with a season from 1984. Fewer contextual problems.
Let’s talk now about some of the better matches, some of the good ones. Jody Reed, 1991, was a 29-year-old second baseman; Damaso Garcia, 1983, was a 28-year-old second baseman. Their stats were extremely similar (below), except that Reed drew 60 walks to Garcia’s 16, and Garcia stole 46 bases to Reed’s 6.
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Jody
|
Reed
|
1991
|
153
|
618
|
87
|
175
|
42
|
2
|
5
|
60
|
60
|
53
|
6
|
.283
|
Damaso
|
Garcia
|
1984
|
152
|
633
|
79
|
180
|
32
|
5
|
5
|
46
|
16
|
46
|
46
|
.284
|
Reed and Garcia also had similar career batting stats at that point, Reed hitting .288 with 14 homers in 572 games, and Garcia hitting .289 with 18 homers in 663 games. We will pause here to note, regarding OwenH’s question, that Reed scored eight more runs than Garcia, 87-79. Garcia batted leadoff for Toronto, whereas Reed batted second for Boston. The offenses of the two teams were similar, Boston scoring 731 runs, Toronto 750. So. . . make what you will out of a sample of one.
A second very good example which is recent enough for some of you to relate to is Chet Lemon, 1979, and Andre Dawson, 1980. These are their primary batting stats:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Andre
|
Dawson
|
1980
|
151
|
577
|
96
|
178
|
41
|
7
|
17
|
87
|
44
|
69
|
34
|
.308
|
Chet
|
Lemon
|
1979
|
148
|
556
|
79
|
177
|
44
|
2
|
17
|
86
|
56
|
68
|
7
|
.318
|
Lemon was then 24, Dawson 25. They had very similar seasons, and also essentially similar careers up to that point, except that Lemon walked more than Dawson, and Dawson was much faster than Lemon, although Lemon was also an exceptional defensive center fielder, but not as fast as Dawson.
Another interesting matchup is Magglio Ordonez, 2001, against Raul Mondesi, 1997, both right fielders. Ordonez drew more walks; Mondesi was faster:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Magglio
|
Ordonez
|
2001
|
160
|
593
|
97
|
181
|
40
|
1
|
31
|
113
|
70
|
70
|
25
|
.305
|
Raul
|
Mondesi
|
1997
|
159
|
616
|
95
|
191
|
42
|
5
|
30
|
87
|
44
|
105
|
32
|
.310
|
Jack Clark, 1982, is matched with Johnny Callison, 1965:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Jack
|
Clark
|
1982
|
157
|
563
|
90
|
154
|
30
|
3
|
27
|
103
|
90
|
91
|
6
|
.274
|
Johnny
|
Callison
|
1965
|
160
|
619
|
93
|
162
|
25
|
16
|
32
|
101
|
57
|
117
|
6
|
.262
|
Roger Maris and Willie Kirkland, both right fielders born in 1934, had similar seasons in 1959; Kirkland was faster and Maris walked more:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Roger
|
Maris
|
1959
|
122
|
433
|
69
|
118
|
21
|
7
|
16
|
72
|
58
|
53
|
2
|
.273
|
Willie
|
Kirkland
|
1959
|
126
|
463
|
64
|
126
|
22
|
3
|
22
|
68
|
42
|
84
|
5
|
.272
|
Dwight Evans, 1985, matches Bobby Bonds, 1978:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Dwight
|
Evans
|
1985
|
159
|
617
|
110
|
162
|
29
|
1
|
29
|
78
|
114
|
105
|
7
|
.263
|
Bobby
|
Bonds
|
1978
|
156
|
565
|
93
|
151
|
19
|
4
|
31
|
90
|
79
|
120
|
43
|
.267
|
These are all good matches, but the BEST match in the data, by the Similarity System, is actually two leadoff men from 100 years ago: Clyde Milan, with the 1906 Washington Senators, and Whitey Witt, with the 1917 Philadelphia Athletics:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Whitey
|
Witt
|
1917
|
128
|
452
|
62
|
114
|
13
|
4
|
0
|
28
|
65
|
45
|
12
|
.252
|
Clyde
|
Milan
|
1908
|
130
|
485
|
55
|
116
|
10
|
12
|
1
|
32
|
38
|
--
|
29
|
.239
|
Milan was the Vince Coleman of his era, a player who would later steal 88 bases in a season. Witt ran well, but not that well, and walked more. Both men played for bad teams. Milan’s Senators, 67-85, scored 479 runs. Witt’s Athletics, 55-98, scored 529. But again, we see that Witt scored more runs than Milan, and did so while making 28 fewer outs.
Pie Traynor, Pirates 24-year-old third baseman in 1924, matches Bob Elliott, 25-year-old third baseman for the same team a generation later:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Pie
|
Traynor
|
1924
|
142
|
545
|
86
|
160
|
26
|
13
|
5
|
82
|
37
|
26
|
24
|
.294
|
Bob
|
Elliott
|
1942
|
143
|
560
|
75
|
166
|
26
|
7
|
9
|
89
|
52
|
35
|
2
|
.296
|
Carlos Pena, a 25-year-old, left-handed hitting first baseman from 2003, matches Hee Seop Choi, a 25-year-old, left-handed hitting first baseman from 2004:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Hee Seop
|
Choi
|
2004
|
126
|
343
|
53
|
86
|
21
|
1
|
15
|
46
|
63
|
96
|
1
|
.251
|
Carlos
|
Pena
|
2003
|
131
|
452
|
51
|
112
|
21
|
6
|
18
|
50
|
53
|
123
|
4
|
.248
|
In order to qualify for the study, the two seasons (and careers) had to have a similarity score of at least 960. This (below) is actually the WORST matchup that qualifies for the study; it scores at 960.02:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Magglio
|
Ordonez
|
2003
|
160
|
606
|
95
|
192
|
46
|
3
|
29
|
99
|
57
|
73
|
9
|
.317
|
Babe
|
Herman
|
1931
|
151
|
610
|
93
|
191
|
43
|
16
|
18
|
97
|
50
|
65
|
17
|
.313
|
Ordonez did this in 2003, Herman 72 years earlier, so that creates a relatively massive 24-point penalty for the comparison. It recovers, however, because Herman hit 16 triples to Ordonez’ 3, which creates a 26-point POSITIVE for the match, since a difference in triples scores as a positive. That enables the Ordonez/Herman match to scrape over the barrier, and be included in the study.
Steve Garvey, a 32-year-old first baseman in 1981, is matched against Ted Kluszewski, a 33-year-old first baseman in 1958:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Steve
|
Garvey
|
1981
|
110
|
431
|
63
|
122
|
23
|
1
|
10
|
64
|
25
|
49
|
3
|
.283
|
Ted
|
Kluszewski
|
1958
|
100
|
301
|
29
|
88
|
13
|
4
|
4
|
37
|
26
|
16
|
0
|
.292
|
But that match doesn’t qualify. Garvey at that time had played 1,565 career games with a .303 career average, 195 home runs, and 2,743 total bases, and Kluszewski had 1,439 games, 255 homers, a .302 average and 2,663 total bases. It’s a pretty decent match, but it’s not good enough for the study.
I wound up with 1,237 qualifying matches.
Conclusion
I would suspect, I would believe, that if I had stuck with the original question which was posed, I would have reached the opposite conclusion. The original question was whether a fast leadoff man with a .350 on base percentage would be preferred to a slower leadoff man with a .375 on base percentage, "preferred" probably meaning that he would score more runs. I would strongly suspect that the general answer to THAT question would be that the slow guy who got on base more would score more runs.
However, my instinctive guess that speed would be more important in shaping the future careers of the otherwise-even players was definitively correct. This chart shows that the two groups of players had similar career numbers, up through the point at which the players were matched:
Test Group
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Speed
|
633
|
2225
|
301
|
606
|
101
|
26
|
42
|
258
|
173
|
269
|
65
|
.268
|
High OBA
|
649
|
2226
|
299
|
608
|
105
|
19
|
51
|
286
|
235
|
271
|
31
|
.268
|
Test Group
|
BB
|
SB
|
Avg
|
OBA
|
SPct
|
OPS
|
|
Speed
|
173
|
65
|
.268
|
.321
|
.386
|
.707
|
|
High OBA
|
235
|
31
|
.268
|
.339
|
.391
|
.730
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All players were matched against other players who played the same defensive position, and there is a ten-day difference in the average age of the combatants, the "Speed" players averaging 27.68 years, and the "On Base" players averaging 27.71.
The High-OBA group had a higher OPS, because they drew more walks although their batting averages were the same. However, in the rest of their careers, the Speed group pulled 5 points ahead in batting average, reduced the gap in OBA from 18 points to 12, eliminated the 5-point gap in slugging percentage, and moved further ahead in stolen bases, increasing the gap from two to one (65-31) to three to one (53-18).
Test Group
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
Avg
|
Speed
|
692
|
2345
|
306
|
633
|
109
|
21
|
48
|
282
|
198
|
286
|
53
|
.254
|
High OBA
|
613
|
2005
|
260
|
539
|
95
|
13
|
51
|
267
|
229
|
258
|
18
|
.249
|
Test Group
|
BB
|
SB
|
Avg
|
OBA
|
SPct
|
OPS
|
|
Speed
|
198
|
53
|
.254
|
.311
|
.365
|
.677
|
|
High OBA
|
229
|
18
|
.249
|
.323
|
.364
|
.687
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
But of most significance, the Speed Group played an average of 79 additional games in the rest of their careers. With 1,237 players in each group, that creates a separation in "rest of career games" of a whopping 98,000 games—855,933 to 757,912. Obviously, a gap of that size cannot result from chance. There were 690 sets of players in which the "speed" player played more games in the rest of his career, 4 sets in which the two played exactly the same number of remaining games, and 543 cases in which the "OBA" player played more games. 56% of the time, the "Speed" player had the longer career after the match.