Speed and On Base Percentage

May 22, 2020
                            Speed and On Base Percentage

 

            I had a question in the "Hey, Bill" section on Thursday, May 21st, which was "Suppose that you have two players who are otherwise equal, except that one. . . well, here, let me quote the question as it was asked. 

 

Would you rather have a fast leadoff man with a .350 OBP, or a slow leadoff man with a .375 OBP?  All other things being equal.

Asked by: OwenH

 

            Tom Tango weighed in on this question, and then I offered the thought that I’d probably rather have the fast guy because he would probably have more defensive value and thus would probably have the longer career.  Then I thought of a study related to that issue, so I wasted the rest of the day doing that study, rather than doing the work I had promised someone else that I would do.  

            My study actually doesn’t focus on LEADOFF MEN; it focused simply on PLAYERS.   I changed the question to "Would you rather have a fast player with a .350 OBA, or a slower player with a .375 OBP, all other things being equal?"  

            If you have been reading this stuff for a while, you know how much I love to do matched-set studies, and I thought of a matched-set study to look at this issue.  I started with all players who played 100 games in a season from the years 1900 to 2010; I think there were 18,454 such players.  

            For each of those players I created an seven-element code, based on:

 

1)     His playing time, Games and Plate Appearances,

2)     His batting average (20-point buckets, .300 to .319999, etc.),

3)     His on base percentage (25-point buckets, .350 to .3749999, etc.),

4)     His slugging percentage (40-point buckets),

5)     His age,

6)     His defensive position, and

7)     His speed. 

I could spend 20 pages explaining EXACTLY how all of this was done, but you know. . .how would reading that make your life better? 

In the study there were several million potential codes, whereas there were only 18,000 and some players, so most players have unique codes; that is, most players have codes that do not exactly match any other players.  12,647 players had unique codes, but that still leaves 5,807 players who had codes identical to some other player. The largest "code group" is the code 3 13 13 09 08 1 3, which is center fielders who were 26 or 27 years old, ran well and who hit .260 to .27999 with an on base percentage of  .325 to .349999, a slugging percentage of .360 to .39999, and playing time equivalent to about 115 to 120 games.  The eight players who fit into this box were Bernie Neis (1923), Taylor Douthit (1927), Jimmy Welsh (1930), Chuck Diering (1949), Jim Delsing (1952), Mike Devereaux (1989), Jermaine Allensworth (1998) and Mark Kotsay (2003). 

What I was interested in, though, was not players who had identical codes, but matched sets of players who had identical codes except that one had a higher on base percentage, and the other one was faster.  To identify those, I created a "second code" for each player, which was the same as the other code, except the player in his second code was one group lower in his on base percentage, but one group higher in his speed. 

This process yielded a list of 1,987 Matched Sets of players—many more than I need to do my stupid little study.   Some of those matches, however, are not really what we are looking for.  For example, Rich Rollins in 1962 is a match with Robin Ventura in 1992.

First

Last

YEAR

AB

R

H

HR

RBI

BB

SB

Avg

SPct

OPS

Rich

Rollins

1962

624

96

186

16

96

75

3

.298

.428

.802

Robin

Ventura

1992

592

85

167

16

93

93

2

.282

.431

.806

 

Rich Rollins and Robin Ventura were both 24-year-old third basemen, but what the chart above doesn’t tell you is that Ventura’s on base percentage was .375, and Rollins was .374.   Sorting them by 25-point buckets, that put Ventura into a higher OBP group than Rollins, but it’s not really what we’re looking for here. 

So I put in a rule that, in order to qualify as a matched set, one of the players had to have an on base percentage at least 10 points higher than the other player.  The grouping system created a separation potential of 1 point to 49 points of on base percentage; I changed to 10 points to 49 points—thus, an average of about 25 points.  That process eliminated 244 of my matches. 

A similar thing with speed.  I have "speed scores" for every player every year (1 to 10) that are a part of my regular data base, and these are pretty reliable, but the difference between a "7" and a "6" in "Speed" is not necessarily all that reliable, like the difference between a .375 on base percentage and a .374 on base percentage.  I added a rule that the difference in Speed has to be at last 2 categories.   This eliminated another 214 of my matches, leaving me with 1,519 matched sets, which is still a lot more than I need. 

Not all of those, however, are good matches.   Some of them frankly are terrible matches.  To site two examples of comically bad matches. . . Jim Rice, 1987, winds up matched with Peanuts Lowrey, 1953, and Al Kaline, 1970, winds up matched with John Vander Wal, 2001. 

Al Kaline in 1970 and John Vander Wal in 2001 were both 35-year-old right fielders.   Kaline hit .278 with 16 homers, 71 RBI; Vander Wal hit .270 with 14 homers, 70 RBI, about the same doubles, triples and runs scored as Kaline—nonetheless, John Vander Wal is not a good match with Al Kaline, because Al Kaline is Al Kaline and John Vander Wal is John Vander Wal. 

I created a 22-point Similarity Score to evaluate each of the Matched Sets.  Kaline and Vander Wal were very close in the one-season categories that formed the match, but they were miles apart in the career-to-date categories of the Similarity system—Career Games, Career Hits, Career Homers, Career Total Bases and other things.   If you remember Peanuts Lowrey—I am not old enough to remember Peanuts Lowrey, but I inherited his baseball card from a brother-in-law—you know how absurd the notion is of Jim Rice being similar to Peanuts Lowrey or, for that matter, anybody nicknamed "Peanuts".  The Lowrey/Rice match is, in a sense, even more absurd, because not only is there are massive difference in their careers up to the point of the comparison, but the seasons are not truly similar, either.  Rice hit .277, Lowrey .269, and Lowrey actually had a higher slugging percentage than Rice did in that one year, .423 to .408.   But Rice had 404 at bats, 13 homers, 62 RBI in the focus season; Lowrey had 182 at bats, 5 homers, 27 RBI.  The bad match happened because the players who played in 100 or more games were sorted into 4 buckets, and those players who were near the 100-game limit had an open end on the bottom; Rice was near the top of that playing-time group, and Lowrey was near the bottom. 

The Similarity Score system had 17 "negative" categories and 5 "positive" categories.   The five "reverse" categories were triples, stolen bases, speed, walks, and on base percentage.   Whereas a difference between the two players in any other category was considered to be a negative, a difference in any of those five categories was considered to be a positive, since we were looking for players who were similar in all other respects, but different in speed and on base percentage.

In the Similarity System, points were deducted from the match if a right-handed batter was compared to a left-hander, and points were deducted for each year of difference in which the seasons occurred.  In general, I didn’t want to match a season from 1930 with a season from 1980.   I preferred to match a season from 1930 with a season from 1925, and a season from 1980 with a season from 1984.   Fewer contextual problems.

Let’s talk now about some of the better matches, some of the good ones.   Jody Reed, 1991, was a 29-year-old second baseman; Damaso Garcia, 1983, was a 28-year-old second baseman.   Their stats were extremely similar (below), except that Reed drew 60 walks to Garcia’s 16, and Garcia stole 46 bases to Reed’s 6. 

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Jody

Reed

1991

153

618

87

175

42

2

5

60

60

53

6

.283

Damaso

Garcia

1984

152

633

79

180

32

5

5

46

16

46

46

.284

 

Reed and Garcia also had similar career batting stats at that point, Reed hitting .288 with 14 homers in 572 games, and Garcia hitting .289 with 18 homers in 663 games.  We will pause here to note, regarding OwenH’s question, that Reed scored eight more runs than Garcia, 87-79. Garcia batted leadoff for Toronto, whereas Reed batted second for Boston.  The offenses of the two teams were similar, Boston scoring 731 runs, Toronto 750.  So. . . make what you will out of a sample of one.   

A second very good example which is recent enough for some of you to relate to is Chet Lemon, 1979, and Andre Dawson, 1980.  These are their primary batting stats:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Andre

Dawson

1980

151

577

96

178

41

7

17

87

44

69

34

.308

Chet

Lemon

1979

148

556

79

177

44

2

17

86

56

68

7

.318

 

Lemon was then 24, Dawson 25.  They had very similar seasons, and also essentially similar careers up to that point, except that Lemon walked more than Dawson, and Dawson was much faster than Lemon, although Lemon was also an exceptional defensive center fielder, but not as fast as Dawson. 

Another interesting matchup is Magglio Ordonez, 2001, against Raul Mondesi, 1997, both right fielders.  Ordonez drew more walks; Mondesi was faster:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Magglio

Ordonez

2001

160

593

97

181

40

1

31

113

70

70

25

.305

Raul

Mondesi

1997

159

616

95

191

42

5

30

87

44

105

32

.310

 

Jack Clark, 1982, is matched with Johnny Callison, 1965:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Jack

Clark

1982

157

563

90

154

30

3

27

103

90

91

6

.274

Johnny

Callison

1965

160

619

93

162

25

16

32

101

57

117

6

.262

 

Roger Maris and Willie Kirkland, both right fielders born in 1934, had similar seasons in 1959; Kirkland was faster and Maris walked more:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Roger

Maris

1959

122

433

69

118

21

7

16

72

58

53

2

.273

Willie

Kirkland

1959

126

463

64

126

22

3

22

68

42

84

5

.272

 

Dwight Evans, 1985, matches Bobby Bonds, 1978:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Dwight

Evans

1985

159

617

110

162

29

1

29

78

114

105

7

.263

Bobby

Bonds

1978

156

565

93

151

19

4

31

90

79

120

43

.267

 

These are all good matches, but the BEST match in the data, by the Similarity System, is actually two leadoff men from 100 years ago:  Clyde Milan, with the 1906 Washington Senators, and Whitey Witt, with the 1917 Philadelphia Athletics:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Whitey

Witt

1917

128

452

62

114

13

4

0

28

65

45

12

.252

Clyde

Milan

1908

130

485

55

116

10

12

1

32

38

--

29

.239

 

Milan was the Vince Coleman of his era, a player who would later steal 88 bases in a season.  Witt ran well, but not that well, and walked more.  Both men played for bad teams.  Milan’s Senators, 67-85, scored 479 runs.  Witt’s Athletics, 55-98, scored 529.   But again, we see that Witt scored more runs than Milan, and did so while making 28 fewer outs. 

Pie Traynor, Pirates 24-year-old third baseman in 1924, matches Bob Elliott, 25-year-old third baseman for the same team a generation later:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Pie

Traynor

1924

142

545

86

160

26

13

5

82

37

26

24

.294

Bob

Elliott

1942

143

560

75

166

26

7

9

89

52

35

2

.296

 

Carlos Pena, a 25-year-old, left-handed hitting first baseman from 2003, matches Hee Seop Choi, a 25-year-old, left-handed hitting first baseman from 2004:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Hee Seop

Choi

2004

126

343

53

86

21

1

15

46

63

96

1

.251

Carlos

Pena

2003

131

452

51

112

21

6

18

50

53

123

4

.248

 

In order to qualify for the study, the two seasons (and careers) had to have a similarity score of at least 960.   This (below) is actually the WORST matchup that qualifies for the study; it scores at 960.02:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Magglio

Ordonez

2003

160

606

95

192

46

3

29

99

57

73

9

.317

Babe

Herman

1931

151

610

93

191

43

16

18

97

50

65

17

.313

 

Ordonez did this in 2003, Herman 72 years earlier, so that creates a relatively massive 24-point penalty for the comparison.   It recovers, however, because Herman hit 16 triples to Ordonez’ 3, which creates a 26-point POSITIVE for the match, since a difference in triples scores as a positive.   That enables the Ordonez/Herman match to scrape over the barrier, and be included in the study. 

Steve Garvey, a 32-year-old first baseman in 1981, is matched against Ted Kluszewski, a 33-year-old first baseman in 1958:

First

Last

YEAR

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Steve

Garvey

1981

110

431

63

122

23

1

10

64

25

49

3

.283

Ted

Kluszewski

1958

100

301

29

88

13

4

4

37

26

16

0

.292

 

But that match doesn’t qualify.  Garvey at that time had played 1,565 career games with a .303 career average, 195 home runs, and 2,743 total bases, and Kluszewski had 1,439 games, 255 homers, a .302 average and 2,663 total bases.   It’s a pretty decent match, but it’s not good enough for the study. 

I wound up with 1,237 qualifying matches. 

 

Conclusion

I would suspect, I would believe, that if I had stuck with the original question which was posed, I would have reached the opposite conclusion.  The original question was whether a fast leadoff man with a .350 on base percentage would be preferred to a slower leadoff man with a .375 on base percentage, "preferred" probably meaning that he would score more runs.   I would strongly suspect that the general answer to THAT question would be that the slow guy who got on base more would score more runs. 

However, my instinctive guess that speed would be more important in shaping the future careers of the otherwise-even players was definitively correct.   This chart shows that the two groups of players had similar career numbers, up through the point at which the players were matched:

Test Group

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Speed

633

2225

301

606

101

26

42

258

173

269

65

.268

High OBA

649

2226

299

608

105

19

51

286

235

271

31

.268

Test Group

BB

SB

Avg

OBA

SPct

OPS

 

Speed

173

65

.268

.321

.386

.707

 

High OBA

235

31

.268

.339

.391

.730

 

                                     

 

            All players were matched against other players who played the same defensive position, and there is a ten-day difference in the average age of the combatants, the "Speed" players averaging 27.68 years, and the "On Base" players averaging 27.71. 

The High-OBA group had a higher OPS, because they drew more walks although their batting averages were the same.  However, in the rest of their careers, the Speed group pulled 5 points ahead in batting average, reduced the gap in OBA from 18 points to 12, eliminated the 5-point gap in slugging percentage, and moved further ahead in stolen bases, increasing the gap from two to one (65-31) to three to one (53-18). 

Test Group

G

AB

R

H

2B

3B

HR

RBI

BB

SO

SB

Avg

Speed

692

2345

306

633

109

21

48

282

198

286

53

.254

High OBA

613

2005

260

539

95

13

51

267

229

258

18

.249

Test Group

BB

SB

Avg

OBA

SPct

OPS

 

Speed

198

53

.254

.311

.365

.677

 

High OBA

229

18

.249

.323

.364

.687

 

                                     

 

But of most significance, the Speed Group played an average of 79 additional games in the rest of their careers.  With 1,237 players in each group, that creates a separation in "rest of career games" of a whopping 98,000 games—855,933 to 757,912.  Obviously, a gap of that size cannot result from chance.  There were 690 sets of players in which the "speed" player played more games in the rest of his career, 4 sets in which the two played exactly the same number of remaining games, and 543 cases in which the "OBA" player played more games.  56% of the time, the "Speed" player had the longer career after the match.    

 

 

 

 

 
 

COMMENTS (11 Comments, most recent shown first)

MarisFan61
Bill: Thanks for the reply about what differential you required on on-base.
I did assume that the average differential would be substantial, but where I was coming from was, it seems to me that including ANY matched pairs with as small a differential as .010 would mean that you've got pairs where there just isn't a meaningful difference at all on that important criterion.
Seems to me that with a matched pair where the difference in on-base average is just .010 or even a little more than that, the differential is sort of just "noise" -- and that you wouldn't want such a pair in the data at all.
2:03 PM May 23rd
 
bjames
SteveN
I'd love to know how you do matched pairs. I'm guessing that you use baseball reference, but, I have no idea how to do this. Since I am a month or so older than you, crap, I have plenty of time to fiddle with this sort of stuff.



It's not done with Baseball Reference at all. I keep a "spreadsheet encyclopedia" in excel of all batters/seasons. I maintain it myself. It has some flaws and failings, but since I know what those are, I can usually skate around them. The matched-set studies are usually done by sorting seasons in Excel, either in that file or in the similar one I maintain for pitchers.

I'd be happy to share the spreadsheet with others, but it needs to be . . ..what's the word? Upgraded? Rectified? Corrected? You'd be welcome to it, but it has too many flaws to be distributed as is.
1:50 PM May 23rd
 
bjames
From MarisFan61


Just a couple of questions about your choices on the criteria:

-- Of course it was good to insist on more than (for example) just a 1-point difference (i.e. .001) in on-base average, i.e. to not let it be enough just that 2 players were in different "categories." But, as I was reading down that paragraph, I was surprised you settled for just a 10-point difference.

You are focusing on one factor, but there are three factors operating. Owen had specified a 25-point separation in on base percentage. If the groups are 25 points wide, a player in a higher group could be as little as 1 point higher (.000000001 or something) or as much as 49 points higher. It would average 25 points.

But the players were selected to be otherwise SIMILAR, which would reduce the average below 25 points, on average.

However, I then put in the "similarity" routine, in which a difference in on base percentage was treated as a positive. This would push the average back up, toward 25 points.

In fact, the average on base percentage of the 1,237 "faster but lower on base percentage" players in this study, in the base season, was .320. The average on base percentage of the 1,237 "slower but higher OBP" players was .345--the 25 point separation which was the intended target. I probably should have mentioned that in the article.
1:46 PM May 23rd
 
trn6229
I remember when Ralph Houk lead off with Wade Boggs. Boggs had average speed but could hit and drew lots of walks. Both Bonds and Rickey Henderson were born in 1958, Boggs on June 15 and Henderson on December 25. from 1983 to 1989, Boggs scored 100, 109, 107, 107, 108, 128 and 113. He never scored 100 after that. Henderson scored 105, 113, 146, 130, 78, 113 and 119. Thats 772 for Boggs, 804 for Henderson though Rickey played 95 games in 1987. Bobby Bonds and Lou Brock match up. Lou Brock is seven years older than Bobby Bonds. From 1969 to 1977, Bonds scored 120, 134, 110, 118, 131, 97, 93, 48 and 103. Brock scored 114, 126, 81, 110, 105, 78, 73 and 69. Bonds had 954 runs, Brock has 766. Bobby Richardson was not a suitable lead off man. The key element is to get on base. Tommie Agee let off for the Mets. He scored 97 in 1969 and 107 in 1970. Gil Hodges liked his lead off man to have some pop. The Giants should not have traded Bonds, Maddox and Matthews. Had they kept them together, they would have done well.

Take Care,
Tom Nahigian
1:40 PM May 23rd
 
OwenH
Bill, thanks for sharing this study and for your work on it. I appreciate the deeper dive into the question. It's interesting that the OBP-superior players seem to be a bit more valuable in terms of the moment, i.e. who would be more valuable to have on your team and in your lineup at the time; but the speedier players have longer and somewhat better careers, because they decline more slowly -- since they have more speed to lose, and of course speed is valuable on defense as well as offense.

With the original question, I specified a 25-point edge in OBP for the slower player, because that seemed to me to be around the margin where the question gets unclear; a fast guy with only 10 fewer points of OBP would clearly be better to have, and likewise a slower player with a 40 point OBP edge is also clearly superior. It's interesting to speculate about where that break point in the margin is; and of course it's impossible to know for sure. Great study, interesting stuff.
11:38 AM May 23rd
 
TJNawrocki
I am old enough to remember Peanuts Lowrey, even though I am younger than you. I remember him as a coach on the Cubs when I was a little kid, and when you are a little kid, Peanuts Lowrey is about the most memorable name there is.
10:46 AM May 23rd
 
StatsGuru
It is worth remembering that on a team level, OBP and runs have an exponential relationship. OBP work like an interest rate, the higher the better. Over 600 plate appearances, a .375 OBP uses 15 fewer outs than a .350 player. Those 15 outs get distributed to the rest of the team. In a way, the other eight batters get to play five more innings. That's probably worth two or three more runs. So high OBP players have a salutary effect on the rest of the team, without that necessarily showing up in the high OBP player's record.
8:43 AM May 23rd
 
SteveN
I'd love to know how you do matched pairs. I'm guessing that you use baseball reference, but, I have no idea how to do this. Since I am a month or so older than you, crap, I have plenty of time to fiddle with this sort of stuff.

One of the charms of your Abstracts was that you could "show your work" in an entertaining way.
7:52 AM May 23rd
 
337
And a question: is the advantage in "career games played" a constant? That is, forgetting the purposes of this essay, if you created two groups of players, identical except for the speed, forgetting entirely about the OBP, would the faster group still have a 56-43 edge in career length? I'd think it would increase more for the faster players, but how much?
5:21 AM May 23rd
 
337
Your best study so far this year, as measured by head-to-head comparisons. Seriously, I love these sorts of studies, far-ranging but covering specific players 1-to-1. I wonder if anyone else sees "Peanuts Lowrey" and thinks "Great fungo hitter" automatically? No idea why I think that, but I do, and I'm pretty sure it's right.
5:15 AM May 23rd
 
MarisFan61
I thought you were going to do something pretty much like this when you said that on Hey Bill!

Just a couple of questions about your choices on the criteria:

-- Of course it was good to insist on more than (for example) just a 1-point difference (i.e. .001) in on-base average, i.e. to not let it be enough just that 2 players were in different "categories." But, as I was reading down that paragraph, I was surprised you settled for just a 10-point difference. I was expecting at least something like 20, especially since you had already mentioned that the system yielded many more matches than you needed. I guess a 10-point difference "felt" to you like it made enough of a difference to suit the question? I wouldn't have thought it would.
I guess a simpler way of asking this: If you had a do-over, would you use a larger requirement on that differential? (I do realize that making it something like 20 rather than 10 would probably have given quite a bit smaller of a sample.)

-- I wonder similarly, if you had a do-over, if in addition to the 100-game requirement, you would have required some full-season-ish amount of plate appearances (maybe even like just 300); or, alternatively, not allow a pair to be a match unless their PA's were within a certain range of either other -- to avoid things like the Jim Rice-Peanuts Lowrey thing. (I imagine there were quite a few similar matches in the sample, and I would think that those kind of matches pollute the results.)
BTW, if I'd been on some million dollar quiz show and there was a question on how to spell Peanuts Lowrey's last name, I would have said L-O-W-E-R-Y without any thought of using a life-line.

12:57 AM May 23rd
 
 
©2020 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy