Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Speed and On Base Percentage

By Bill James

May 22, 2020

Speed and On Base Percentage

I had a question in the "Hey, Bill" section on Thursday, May 21st, which was "Suppose that you have two players who are otherwise equal, except that one. . . well, here, let me quote the question as it was asked.

Would you rather have a fast leadoff man with a .350 OBP, or a slow leadoff man with a .375 OBP? All other things being equal.

Asked by: OwenH

Tom Tango weighed in on this question, and then I offered the thought that I’d probably rather have the fast guy because he would probably have more defensive value and thus would probably have the longer career. Then I thought of a study related to that issue, so I wasted the rest of the day doing that study, rather than doing the work I had promised someone else that I would do.

My study actually doesn’t focus on LEADOFF MEN; it focused simply on PLAYERS. I changed the question to "Would you rather have a fast player with a .350 OBA, or a slower player with a .375 OBP, all other things being equal?"

If you have been reading this stuff for a while, you know how much I love to do matched-set studies, and I thought of a matched-set study to look at this issue. I started with all players who played 100 games in a season from the years 1900 to 2010; I think there were 18,454 such players.

For each of those players I created an seven-element code, based on:

1) His playing time, Games and Plate Appearances,

2) His batting average (20-point buckets, .300 to .319999, etc.),

3) His on base percentage (25-point buckets, .350 to .3749999, etc.),

4) His slugging percentage (40-point buckets),

5) His age,

6) His defensive position, and

7) His speed.

I could spend 20 pages explaining EXACTLY how all of this was done, but you know. . .how would reading that make your life better?

In the study there were several million potential codes, whereas there were only 18,000 and some players, so most players have unique codes; that is, most players have codes that do not exactly match any other players. 12,647 players had unique codes, but that still leaves 5,807 players who had codes identical to some other player. The largest "code group" is the code 3 13 13 09 08 1 3, which is center fielders who were 26 or 27 years old, ran well and who hit .260 to .27999 with an on base percentage of .325 to .349999, a slugging percentage of .360 to .39999, and playing time equivalent to about 115 to 120 games. The eight players who fit into this box were Bernie Neis (1923), Taylor Douthit (1927), Jimmy Welsh (1930), Chuck Diering (1949), Jim Delsing (1952), Mike Devereaux (1989), Jermaine Allensworth (1998) and Mark Kotsay (2003).

What I was interested in, though, was not players who had identical codes, but matched sets of players who had identical codes except that one had a higher on base percentage, and the other one was faster. To identify those, I created a "second code" for each player, which was the same as the other code, except the player in his second code was one group lower in his on base percentage, but one group higher in his speed.

This process yielded a list of 1,987 Matched Sets of players—many more than I need to do my stupid little study. Some of those matches, however, are not really what we are looking for. For example, Rich Rollins in 1962 is a match with Robin Ventura in 1992.

First	Last	YEAR	AB	R	H	HR	RBI	BB	SB	Avg	SPct	OPS
Rich	Rollins	1962	624	96	186	16	96	75	3	.298	.428	.802
Robin	Ventura	1992	592	85	167	16	93	93	2	.282	.431	.806

Rich Rollins and Robin Ventura were both 24-year-old third basemen, but what the chart above doesn’t tell you is that Ventura’s on base percentage was .375, and Rollins was .374. Sorting them by 25-point buckets, that put Ventura into a higher OBP group than Rollins, but it’s not really what we’re looking for here.

So I put in a rule that, in order to qualify as a matched set, one of the players had to have an on base percentage at least 10 points higher than the other player. The grouping system created a separation potential of 1 point to 49 points of on base percentage; I changed to 10 points to 49 points—thus, an average of about 25 points. That process eliminated 244 of my matches.

A similar thing with speed. I have "speed scores" for every player every year (1 to 10) that are a part of my regular data base, and these are pretty reliable, but the difference between a "7" and a "6" in "Speed" is not necessarily all that reliable, like the difference between a .375 on base percentage and a .374 on base percentage. I added a rule that the difference in Speed has to be at last 2 categories. This eliminated another 214 of my matches, leaving me with 1,519 matched sets, which is still a lot more than I need.

Not all of those, however, are good matches. Some of them frankly are terrible matches. To site two examples of comically bad matches. . . Jim Rice, 1987, winds up matched with Peanuts Lowrey, 1953, and Al Kaline, 1970, winds up matched with John Vander Wal, 2001.

Al Kaline in 1970 and John Vander Wal in 2001 were both 35-year-old right fielders. Kaline hit .278 with 16 homers, 71 RBI; Vander Wal hit .270 with 14 homers, 70 RBI, about the same doubles, triples and runs scored as Kaline—nonetheless, John Vander Wal is not a good match with Al Kaline, because Al Kaline is Al Kaline and John Vander Wal is John Vander Wal.

I created a 22-point Similarity Score to evaluate each of the Matched Sets. Kaline and Vander Wal were very close in the one-season categories that formed the match, but they were miles apart in the career-to-date categories of the Similarity system—Career Games, Career Hits, Career Homers, Career Total Bases and other things. If you remember Peanuts Lowrey—I am not old enough to remember Peanuts Lowrey, but I inherited his baseball card from a brother-in-law—you know how absurd the notion is of Jim Rice being similar to Peanuts Lowrey or, for that matter, anybody nicknamed "Peanuts". The Lowrey/Rice match is, in a sense, even more absurd, because not only is there are massive difference in their careers up to the point of the comparison, but the seasons are not truly similar, either. Rice hit .277, Lowrey .269, and Lowrey actually had a higher slugging percentage than Rice did in that one year, .423 to .408. But Rice had 404 at bats, 13 homers, 62 RBI in the focus season; Lowrey had 182 at bats, 5 homers, 27 RBI. The bad match happened because the players who played in 100 or more games were sorted into 4 buckets, and those players who were near the 100-game limit had an open end on the bottom; Rice was near the top of that playing-time group, and Lowrey was near the bottom.

The Similarity Score system had 17 "negative" categories and 5 "positive" categories. The five "reverse" categories were triples, stolen bases, speed, walks, and on base percentage. Whereas a difference between the two players in any other category was considered to be a negative, a difference in any of those five categories was considered to be a positive, since we were looking for players who were similar in all other respects, but different in speed and on base percentage.

In the Similarity System, points were deducted from the match if a right-handed batter was compared to a left-hander, and points were deducted for each year of difference in which the seasons occurred. In general, I didn’t want to match a season from 1930 with a season from 1980. I preferred to match a season from 1930 with a season from 1925, and a season from 1980 with a season from 1984. Fewer contextual problems.

Let’s talk now about some of the better matches, some of the good ones. Jody Reed, 1991, was a 29-year-old second baseman; Damaso Garcia, 1983, was a 28-year-old second baseman. Their stats were extremely similar (below), except that Reed drew 60 walks to Garcia’s 16, and Garcia stole 46 bases to Reed’s 6.

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Jody	Reed	1991	153	618	87	175	42	2	5	60	60	53	6	.283
Damaso	Garcia	1984	152	633	79	180	32	5	5	46	16	46	46	.284

Reed and Garcia also had similar career batting stats at that point, Reed hitting .288 with 14 homers in 572 games, and Garcia hitting .289 with 18 homers in 663 games. We will pause here to note, regarding OwenH’s question, that Reed scored eight more runs than Garcia, 87-79. Garcia batted leadoff for Toronto, whereas Reed batted second for Boston. The offenses of the two teams were similar, Boston scoring 731 runs, Toronto 750. So. . . make what you will out of a sample of one.

A second very good example which is recent enough for some of you to relate to is Chet Lemon, 1979, and Andre Dawson, 1980. These are their primary batting stats:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Andre	Dawson	1980	151	577	96	178	41	7	17	87	44	69	34	.308
Chet	Lemon	1979	148	556	79	177	44	2	17	86	56	68	7	.318

Lemon was then 24, Dawson 25. They had very similar seasons, and also essentially similar careers up to that point, except that Lemon walked more than Dawson, and Dawson was much faster than Lemon, although Lemon was also an exceptional defensive center fielder, but not as fast as Dawson.

Another interesting matchup is Magglio Ordonez, 2001, against Raul Mondesi, 1997, both right fielders. Ordonez drew more walks; Mondesi was faster:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Magglio	Ordonez	2001	160	593	97	181	40	1	31	113	70	70	25	.305
Raul	Mondesi	1997	159	616	95	191	42	5	30	87	44	105	32	.310

Jack Clark, 1982, is matched with Johnny Callison, 1965:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Jack	Clark	1982	157	563	90	154	30	3	27	103	90	91	6	.274
Johnny	Callison	1965	160	619	93	162	25	16	32	101	57	117	6	.262

Roger Maris and Willie Kirkland, both right fielders born in 1934, had similar seasons in 1959; Kirkland was faster and Maris walked more:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Roger	Maris	1959	122	433	69	118	21	7	16	72	58	53	2	.273
Willie	Kirkland	1959	126	463	64	126	22	3	22	68	42	84	5	.272

Dwight Evans, 1985, matches Bobby Bonds, 1978:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Dwight	Evans	1985	159	617	110	162	29	1	29	78	114	105	7	.263
Bobby	Bonds	1978	156	565	93	151	19	4	31	90	79	120	43	.267

These are all good matches, but the BEST match in the data, by the Similarity System, is actually two leadoff men from 100 years ago: Clyde Milan, with the 1906 Washington Senators, and Whitey Witt, with the 1917 Philadelphia Athletics:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Whitey	Witt	1917	128	452	62	114	13	4	0	28	65	45	12	.252
Clyde	Milan	1908	130	485	55	116	10	12	1	32	38	--	29	.239

Milan was the Vince Coleman of his era, a player who would later steal 88 bases in a season. Witt ran well, but not that well, and walked more. Both men played for bad teams. Milan’s Senators, 67-85, scored 479 runs. Witt’s Athletics, 55-98, scored 529. But again, we see that Witt scored more runs than Milan, and did so while making 28 fewer outs.

Pie Traynor, Pirates 24-year-old third baseman in 1924, matches Bob Elliott, 25-year-old third baseman for the same team a generation later:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Pie	Traynor	1924	142	545	86	160	26	13	5	82	37	26	24	.294
Bob	Elliott	1942	143	560	75	166	26	7	9	89	52	35	2	.296

Carlos Pena, a 25-year-old, left-handed hitting first baseman from 2003, matches Hee Seop Choi, a 25-year-old, left-handed hitting first baseman from 2004:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Hee Seop	Choi	2004	126	343	53	86	21	1	15	46	63	96	1	.251
Carlos	Pena	2003	131	452	51	112	21	6	18	50	53	123	4	.248

In order to qualify for the study, the two seasons (and careers) had to have a similarity score of at least 960. This (below) is actually the WORST matchup that qualifies for the study; it scores at 960.02:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Magglio	Ordonez	2003	160	606	95	192	46	3	29	99	57	73	9	.317
Babe	Herman	1931	151	610	93	191	43	16	18	97	50	65	17	.313

Ordonez did this in 2003, Herman 72 years earlier, so that creates a relatively massive 24-point penalty for the comparison. It recovers, however, because Herman hit 16 triples to Ordonez’ 3, which creates a 26-point POSITIVE for the match, since a difference in triples scores as a positive. That enables the Ordonez/Herman match to scrape over the barrier, and be included in the study.

Steve Garvey, a 32-year-old first baseman in 1981, is matched against Ted Kluszewski, a 33-year-old first baseman in 1958:

First	Last	YEAR	G	AB	R	H	2B	3B	HR	RBI	BB	SO	SB	Avg
Steve	Garvey	1981	110	431	63	122	23	1	10	64	25	49	3	.283
Ted	Kluszewski	1958	100	301	29	88	13	4	4	37	26	16	0	.292

But that match doesn’t qualify. Garvey at that time had played 1,565 career games with a .303 career average, 195 home runs, and 2,743 total bases, and Kluszewski had 1,439 games, 255 homers, a .302 average and 2,663 total bases. It’s a pretty decent match, but it’s not good enough for the study.

I wound up with 1,237 qualifying matches.

Conclusion

I would suspect, I would believe, that if I had stuck with the original question which was posed, I would have reached the opposite conclusion. The original question was whether a fast leadoff man with a .350 on base percentage would be preferred to a slower leadoff man with a .375 on base percentage, "preferred" probably meaning that he would score more runs. I would strongly suspect that the general answer to THAT question would be that the slow guy who got on base more would score more runs.

However, my instinctive guess that speed would be more important in shaping the future careers of the otherwise-even players was definitively correct. This chart shows that the two groups of players had similar career numbers, up through the point at which the players were matched:

Test Group	G		AB		R		H		2B		3B	HR	RBI	BB	SO	SB	Avg
Speed	633		2225		301		606		101		26	42	258	173	269	65	.268
High OBA	649		2226		299		608		105		19	51	286	235	271	31	.268
Test Group	BB	SB		Avg		OBA		SPct		OPS
Speed	173	65		.268		.321		.386		.707
High OBA	235	31		.268		.339		.391		.730

All players were matched against other players who played the same defensive position, and there is a ten-day difference in the average age of the combatants, the "Speed" players averaging 27.68 years, and the "On Base" players averaging 27.71.

The High-OBA group had a higher OPS, because they drew more walks although their batting averages were the same. However, in the rest of their careers, the Speed group pulled 5 points ahead in batting average, reduced the gap in OBA from 18 points to 12, eliminated the 5-point gap in slugging percentage, and moved further ahead in stolen bases, increasing the gap from two to one (65-31) to three to one (53-18).

Test Group	G		AB		R		H		2B		3B	HR	RBI	BB	SO	SB	Avg
Speed	692		2345		306		633		109		21	48	282	198	286	53	.254
High OBA	613		2005		260		539		95		13	51	267	229	258	18	.249
Test Group	BB	SB		Avg		OBA		SPct		OPS
Speed	198	53		.254		.311		.365		.677
High OBA	229	18		.249		.323		.364		.687

But of most significance, the Speed Group played an average of 79 additional games in the rest of their careers. With 1,237 players in each group, that creates a separation in "rest of career games" of a whopping 98,000 games—855,933 to 757,912. Obviously, a gap of that size cannot result from chance. There were 690 sets of players in which the "speed" player played more games in the rest of his career, 4 sets in which the two played exactly the same number of remaining games, and 543 cases in which the "OBA" player played more games. 56% of the time, the "Speed" player had the longer career after the match.

COMMENTS (9 Comments, most recent shown first)

MarisFan61
Bill: Thanks for the reply about what differential you required on on-base.
I did assume that the average differential would be substantial, but where I was coming from was, it seems to me that including ANY matched pairs with as small a differential as .010 would mean that you've got pairs where there just isn't a meaningful difference at all on that important criterion.
Seems to me that with a matched pair where the difference in on-base average is just .010 or even a little more than that, the differential is sort of just "noise" -- and that you wouldn't want such a pair in the data at all.
2:03 PM May 23rd

bjames
SteveN
I'd love to know how you do matched pairs. I'm guessing that you use baseball reference, but, I have no idea how to do this. Since I am a month or so older than you, crap, I have plenty of time to fiddle with this sort of stuff.

It's not done with Baseball Reference at all. I keep a "spreadsheet encyclopedia" in excel of all batters/seasons. I maintain it myself. It has some flaws and failings, but since I know what those are, I can usually skate around them. The matched-set studies are usually done by sorting seasons in Excel, either in that file or in the similar one I maintain for pitchers.

I'd be happy to share the spreadsheet with others, but it needs to be . . ..what's the word? Upgraded? Rectified? Corrected? You'd be welcome to it, but it has too many flaws to be distributed as is.
1:50 PM May 23rd

bjames
From MarisFan61

Just a couple of questions about your choices on the criteria:

-- Of course it was good to insist on more than (for example) just a 1-point difference (i.e. .001) in on-base average, i.e. to not let it be enough just that 2 players were in different "categories." But, as I was reading down that paragraph, I was surprised you settled for just a 10-point difference.

You are focusing on one factor, but there are three factors operating. Owen had specified a 25-point separation in on base percentage. If the groups are 25 points wide, a player in a higher group could be as little as 1 point higher (.000000001 or something) or as much as 49 points higher. It would average 25 points.

But the players were selected to be otherwise SIMILAR, which would reduce the average below 25 points, on average.

However, I then put in the "similarity" routine, in which a difference in on base percentage was treated as a positive. This would push the average back up, toward 25 points.

In fact, the average on base percentage of the 1,237 "faster but lower on base percentage" players in this study, in the base season, was .320. The average on base percentage of the 1,237 "slower but higher OBP" players was .345--the 25 point separation which was the intended target. I probably should have mentioned that in the article.
1:46 PM May 23rd

trn6229
I remember when Ralph Houk lead off with Wade Boggs. Boggs had average speed but could hit and drew lots of walks. Both Bonds and Rickey Henderson were born in 1958, Boggs on June 15 and Henderson on December 25. from 1983 to 1989, Boggs scored 100, 109, 107, 107, 108, 128 and 113. He never scored 100 after that. Henderson scored 105, 113, 146, 130, 78, 113 and 119. Thats 772 for Boggs, 804 for Henderson though Rickey played 95 games in 1987. Bobby Bonds and Lou Brock match up. Lou Brock is seven years older than Bobby Bonds. From 1969 to 1977, Bonds scored 120, 134, 110, 118, 131, 97, 93, 48 and 103. Brock scored 114, 126, 81, 110, 105, 78, 73 and 69. Bonds had 954 runs, Brock has 766. Bobby Richardson was not a suitable lead off man. The key element is to get on base. Tommie Agee let off for the Mets. He scored 97 in 1969 and 107 in 1970. Gil Hodges liked his lead off man to have some pop. The Giants should not have traded Bonds, Maddox and Matthews. Had they kept them together, they would have done well.

Take Care,
Tom Nahigian
1:40 PM May 23rd

OwenH
Bill, thanks for sharing this study and for your work on it. I appreciate the deeper dive into the question. It's interesting that the OBP-superior players seem to be a bit more valuable in terms of the moment, i.e. who would be more valuable to have on your team and in your lineup at the time; but the speedier players have longer and somewhat better careers, because they decline more slowly -- since they have more speed to lose, and of course speed is valuable on defense as well as offense.

With the original question, I specified a 25-point edge in OBP for the slower player, because that seemed to me to be around the margin where the question gets unclear; a fast guy with only 10 fewer points of OBP would clearly be better to have, and likewise a slower player with a 40 point OBP edge is also clearly superior. It's interesting to speculate about where that break point in the margin is; and of course it's impossible to know for sure. Great study, interesting stuff.
11:38 AM May 23rd

TJNawrocki
I am old enough to remember Peanuts Lowrey, even though I am younger than you. I remember him as a coach on the Cubs when I was a little kid, and when you are a little kid, Peanuts Lowrey is about the most memorable name there is.
10:46 AM May 23rd

StatsGuru
It is worth remembering that on a team level, OBP and runs have an exponential relationship. OBP work like an interest rate, the higher the better. Over 600 plate appearances, a .375 OBP uses 15 fewer outs than a .350 player. Those 15 outs get distributed to the rest of the team. In a way, the other eight batters get to play five more innings. That's probably worth two or three more runs. So high OBP players have a salutary effect on the rest of the team, without that necessarily showing up in the high OBP player's record.
8:43 AM May 23rd

SteveN
I'd love to know how you do matched pairs. I'm guessing that you use baseball reference, but, I have no idea how to do this. Since I am a month or so older than you, crap, I have plenty of time to fiddle with this sort of stuff.

One of the charms of your Abstracts was that you could "show your work" in an entertaining way.
7:52 AM May 23rd

MarisFan61
I thought you were going to do something pretty much like this when you said that on Hey Bill!

Just a couple of questions about your choices on the criteria:

-- Of course it was good to insist on more than (for example) just a 1-point difference (i.e. .001) in on-base average, i.e. to not let it be enough just that 2 players were in different "categories." But, as I was reading down that paragraph, I was surprised you settled for just a 10-point difference. I was expecting at least something like 20, especially since you had already mentioned that the system yielded many more matches than you needed. I guess a 10-point difference "felt" to you like it made enough of a difference to suit the question? I wouldn't have thought it would.
I guess a simpler way of asking this: If you had a do-over, would you use a larger requirement on that differential? (I do realize that making it something like 20 rather than 10 would probably have given quite a bit smaller of a sample.)

-- I wonder similarly, if you had a do-over, if in addition to the 100-game requirement, you would have required some full-season-ish amount of plate appearances (maybe even like just 300); or, alternatively, not allow a pair to be a match unless their PA's were within a certain range of either other -- to avoid things like the Jim Rice-Peanuts Lowrey thing. (I imagine there were quite a few similar matches in the sample, and I would think that those kind of matches pollute the results.)
BTW, if I'd been on some million dollar quiz show and there was a question on how to spell Peanuts Lowrey's last name, I would have said L-O-W-E-R-Y without any thought of using a life-line.

12:57 AM May 23rd

Speed and On Base Percentage

COMMENTS (9 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: