More Speed Vs. OBP

May 23, 2020
                                           More Speed and OBP

 

            Minutes after finishing the other study about Speed and On Base Percentage, I thought of another way to study the problem.   As I have discussed here before, I maintain a data base of players’ career game logs, really just because I like to have it; it is now up to 157 players’ careers, and 272,414 lines of data, each representing a game by a player.  The data records what each player’s batting order position was, so it occurred to me that I could use that to study more directly the question posed by Owen H:  which is more valuable, a fast leadoff man with a .350 on base percentage, or a slow leadoff man with a .375 on base percentage? 

            The first step in the study was to create speed scores for each player, based on the data in the file.   I’ve got five things in the file which can be used for that:  Stolen Bases, Triples, Grounded into Double Plays, Defensive Position and Reached On Error.  I gave the player:

            25 points for a Stolen Base

            50 points for a Triple

            Negative 40 points for Grounding into a Double Play

            2 points for reaching on an error

            2 points if his first defensive position in the game was center field or pinch runner

            1 point if it was left or right field or shortstop

            Negative 1 point if it was first base

            Negative 2 points if it was catcher, DH, or pinch hitter

 

            Then I created rolling totals for each player during each 100-game sample of his career, and divided that by his plate appearances in those 100 games.  The player has a different speed score every day.   According to the system, the fastest player in the data was Willie Wilson on August 10, 1978; he had 972 Speed Points in the 101 games surrounding that game, and only 139 plate appearances, an average of almost 7 points per plate appearance (6.993).  The slowest player/day was Ernie Lombardi on April 22, 1944; he had negative 1,187 points with 348 plate appearances, an average of negative 3.41 points per plate appearance. 

            It’s a crude system, but it works.  The next-fastest player after Willie Wilson on August 10, 1978, is Willie Wilson on August 11, 1978; the fastest 70 player/daily ratings are all Willie Wilson.  The fastest players after Wilson are Miguel Dilone (peaking September 3, 1979), Lou Brock (peaking July 29, 1974 at 4.72), and Maury Wills (peaking August 8, 1962).  Wills is in 467th place, but the 466 names ahead of him are all Willie Wilson, Miguel Dilone and Lou Brock, and remember these are not scores from that game, but scores from the 101 games of which this game is the central game.  Then Davey Lopes (August 13, 1976), Cesar Cedeno (September 23, 1977), Willie Davis (July 5, 1970), Ray Lankford (September 9, 1971), and Bert Campaneris (June 20, 1965).  Slowest, after Lombardi, is Rusty Staub at the end of his career.   Raines and Henderson are not in the data yet.

            Anyway, everybody has a speed score, which changes every day.   Having assigned speed scores to everybody, I took out of the study group all games by all players in which they batted in any position other than leadoff.   This left me with 31,840 games, all by players with "known" speed, and all batting leadoff. 

            The slowest player used in the leadoff slot was Rusty Staub on August 19, 1983, but that was just a pinch-hitting appearance, not a true leadoff man.  The slowest player used as a leadoff man to start the game was Red Schoendienst on July 21, 1962.   Schoendienst would have been 39 years old at that time; all of the "slowest" players are guys at the end of their careers.

 

            Anyway, next I removed from the data all games in which:

(a)  The player did not start, or

(b)  In which he had less than 3 plate appearances, or

(c)   In which he had more than 7 plate appearances. 

 

Not a big deal, but I didn’t want pinch hitters and other substitutes in my data, and long extra-inning games will screw with your data sometimes.  This left me with 30,103 leadoff games, of which 15,052 were tagged as games by "fast" runners and 15,051 were tagged as games by "slow" runners. 

Almost everybody who leads off, probably literally everybody, is "slow" by the standards of leadoff men toward the end of his career.   Even Willie Wilson is in the slow group for a few games in the 1990s, and Maury Wills is in the slow group in 1972.   Not sure if the opposite is true, probably not; not sure if Brian Downing ever makes the fast group.  I’ll check. 

Anyway, now I have fast games and slow games; now my problem is to get fast leadoff men with a .350 on base percentage, and slow leadoff men with a .375 on base percentage.  

I did this in the following manner.  The overall on base percentage of the 30,103 leadoff men was .344.   I sorted the games at random, creating a random sequence of games within each speed group.   I started with the first 50 games for each group, randomly selected, all marked "1" or "In", and I figured the on base percentage of those 50 games.  

Then, for all additional games, I scored it "1" (in) if the player reached base in that game by hit, walk or hit batsman.   If he did not reach base in the game by hit, walk or hit batsman, the game was still "1" (in) if the group’s cumulative on base percentage so far was over .35000000, for the speed group, or over .375000000 for the slow group.   All games were included in the study unless (1) the player did not reach base in the game, AND (2) the group was in a position in which they needed runners on base to increase their on base percentage.  Of course, 1-for-4 or 1-for-5 does not increase the on base percentage, but that doesn’t matter; if you just exclude a few games in which the player did not reach base at all, the on base percentage will stay above the target. 

Of the 15,052 leadoff games by fast players, 14,747 wound up being included in the sample; you just need to exclude 2% of the games to drive the on base percentage of that group up to .350.   Of the 15,051 leadoff games by slower players, 13,765 wound up included in the sample, as you need to exclude 8.5% of the games in order to drive the on base percentage up to .375. 

            You may note that there is nothing in this system to comply with the "all other things being equal" mandate.  I counted on the nature of the game to ensure that the other things, the unmonitored things, would even out, and this was more or less true; they generally did even out, more or less, although you can quibble about that if you want to. 

            Anyway, at the conclusion of this I had 14,747 games by "fast" leadoff men with an on base percentage of .350, and 13,765 games by "slow" leadoff men with an on base percentage of .375.   This is equivalent to 91 162-game seasons by fast leadoff men, and 85 162-game seasons by slow leadoff men. 

            Conclusion:  it is probably but not certainly true that the advantage lies with the slower leadoff men with the higher on base percentage.  The advantage, if there is one, if it does in fact exist, is very, very small, certainly no more than five runs per season, and probably less than half of that.   However, it does appear to me, as best I am able to see, that the advantage lies in that direction.

            I can see that I am going to have to break this data down into two or three lines.  This is the "fast" leadoff men:

GS

AB

R

H

2B

3B

HR

RBI

BB

IBB

SO

HBP

SH

SF

14747

61712

9827

18112

2540

818

883

5050

5103

466

7547

367

513

290

162

678

108

199

28

9

10

55

56

5

83

4

6

3

 

XI

ROE

GDP

SB

CS

AVG

OBP

SLG

Outs

RC

8

1073

640

4528

1296

.293

.350

.404

46339

9409

0

12

7

50

14

.293

.350

.404

509

103

 

            "XI" is "reached base on catcher’s interference."   The fast leadoff men hit .293, had 199 hits per season, 28 doubles, 9 triples, 10 homers, 55 RBI, 56 walks.   This is the "slow" leadoff men:

GS

AB

R

H

2B

3B

HR

RBI

BB

IBB

SO

HBP

SH

SF

13765

56373

9255

16968

2724

558

1226

5559

6448

379

5953

394

446

281

162

663

109

200

32

7

14

65

76

4

70

5

5

3

 

XI

ROE

GDP

SB

CS

AVG

OBP

SLG

Outs

RC

4

949

819

1166

808

.301

.375

.434

41759

9356

0

11

10

14

10

.301

.375

.434

491

110

 

            The "slow" leadoff men hit .301, had 32 doubles, 7 triples, 14 homers, 65 RBI, 76 walks.  The OPS for the fast group was .754, for the slow group .809, but most if not all of that advantage was offset by the speed advantages.   The slow players had one more hit per season (200 to 199) and scored one more run per season (109 to 108).  Per 162 games, the fast players stole 36 more bases (50 to 14) with only 4 more caught stealing (14 to 10).  The fast players hit more triples (9 to 7), reached base more often on an error (11 to 10), and grounded into fewer double plays (7 to 10).

            The simplest analysis is that the slow leadoff men scored 1 more run per season (109 to 108), plus they made 18 fewer outs per season, which has a value of about three runs.  The slow leadoff men are about 4 runs per season better. 

            The slow leadoff men created 7 more runs per season (110 to 103), but whereas the fast players scored MORE runs than they created (108 to 103), the slow players scored less than they created (109 to 110). 

            OK, just for the halibut, here is a count of the games by each player included in the "fast leadoff men" study.  Remember that some games were arbitrarily excluded from the study because the player did not have a hit in the game.   Lou Brock had another 53 games in which he was classified as a fast leadoff man, but those games were not included in the study because they were excluded in order to maintain the .350 on base percentage.  He also had another 5 games in which he was classified as a slower leadoff man, but those were excluded into order to maintain the .375 on base percentage of the group.  Fast:

First N

Last

Total

Lou

Brock

1786

Ichiro

Suzuki

1567

Bert

Campaneris

1315

Willie

Wilson

1314

Maury

Wills

1201

Davey

Lopes

915

Luis

Aparicio

877

Paul

Molitor

841

Bobby

Bonds

826

Matty

Alou

474

Cesar

Tovar

467

Jose

Cardenal

404

Miguel

Dilone

375

Zoilo

Versalles

285

Jim

Gilliam

273

Dale

Mitchell

211

Dick

Howser

202

Kirby

Puckett

195

Ray

Lankford

187

Vic

Davalillo

163

Willie

Davis

140

Roy

White

74

Al

Spangler

68

Cesar

Cedeno

68

Lee

Maye

66

Dick

McAuliffe

62

Red

Schoendienst

59

Paul

Blair

52

Tony

Phillips

50

Amos

Otis

44

Gene

Clines

28

Vada

Pinson

26

Toby

Harrah

20

Mack

Jones

19

Don

Money

13

Mark

Belanger

11

Jose

Cruz

10

Manny

Mota

8

Dick

Allen

7

Johnny

Callison

7

Roger

Maris

6

Lou

Boudreau

5

Larry

Doby

4

Bob

Skinner

4

Harvey

Kuenn

3

Fred

Lynn

3

Bob

Cerv

3

Jim

Fregosi

3

Floyd

Robinson

2

Jimmy

Wynn

2

Chet

Lemon

2

 

            And slow:

First N

Last

Total

 

First N

Last

Total

Tony

Phillips

1242

 

Jerry

Lumpe

62

Harvey

Kuenn

873

 

Ken

Boyer

61

Jim

Gilliam

699

 

Bobby

Bonds

60

Felipe

Alou

668

 

Minnie

Minoso

58

Paul

Molitor

650

 

Jim

Northrup

56

Dick

McAuliffe

588

 

Woodie

Held

56

Red

Schoendienst

571

 

Cecil

Cooper

55

Brian

Downing

560

 

Norm

Siebern

50

Cesar

Tovar

464

 

Floyd

Robinson

48

Dale

Mitchell

408

 

Mack

Jones

48

Lou

Boudreau

341

 

Eddie

Mathews

46

Luis

Aparicio

329

 

Jim

Fregosi

37

Vic

Power

268

 

Bill

Mazeroski

36

Zoilo

Versalles

250

 

Johnny

Callison

32

Gil

McDougald

246

 

Dusty

Baker

31

Dick

Howser

244

 

Lou

Brock

29

Cookie

Rojas

235

 

Cesar

Cedeno

27

Maury

Wills

226

 

Will

Clark

27

Davey

Lopes

220

 

Jack

Clark

18

Ichiro

Suzuki

202

 

Bob

Allison

17

Gene

Clines

199

 

Ed

Kirkpatrick

17

Joe

DeMaestri

196

 

Enos

Slaughter

16

Kirby

Puckett

193

 

Darrell

Evans

16

Don

Money

184

 

Nelson

Mathews

13

Lee

Maye

178

 

Roger

Maris

11

Wayne

Causey

161

 

Willie

Wilson

11

Roberto

Clemente

144

 

Jim Ray

Hart

10

Bill

Madlock

132

 

Bob

Cerv

10

Bobby

Del Greco

118

 

Manny

Mota

10

Chet

Lemon

115

 

Frank J

Thomas

8

Mark

Belanger

113

 

Bill

White

7

Ken

Singleton

112

 

Ray

Lankford

7

Al

Spangler

112

 

Rusty

Staub

6

Jimmy

Wynn

110

 

Bob

Elliott

4

Roy

White

107

 

Jeff

Heath

4

Vic

Davalillo

103

 

Dick

Groat

3

Denis

Menke

100

 

Fred

Lynn

3

Toby

Harrah

97

 

Jose

Cruz

3

Bobby

Doerr

95

 

Dave

Parker

3

Vada

Pinson

93

 

Sid

Gordon

3

Paul

Blair

92

 

Dick

Allen

2

Bob

Skinner

91

 

Norm

Cash

2

Matty

Alou

89

 

Doug

DeCinces

2

Rico

Petrocelli

79

 

Harmon

Killebrew

2

Jim

Hickman

79

 

Ed

Charles

1

Jose

Cardenal

77

 

Frank

Howard

1

Tito

Francona

76

 

Al

Oliver

1

Bert

Campaneris

74

 

Andy

Van Slyke

1

Gino

Cimoli

65

 

Willie

Davis

1

Joe

Gordon

64

 

Mark

McGwire

1

 

            There are actually 100 players, an even 100, who are listed in the study as slower leadoff men.   Most of them never made it into the "fast leadoff men" group, as Brian Downing did not.   Thanks for reading.

 

            Lonely days,

            Lonely Nights.

            What would I do without my readers?

 

 
 

COMMENTS (17 Comments, most recent shown first)

MarisFan61
Re whether the difference between exactly how Owen put it and what Bill looked at makes a difference, i.e. whether it keeps these results from being applicable to what Owen asked:

I think it depends on whether the effect of slowness in a leadoff man is a continuum.
If it is, I think these results do apply to it. If it isn't, they might not.
1:34 PM May 25th
 
MarisFan61
P.S. Please forget that last post -- sort of true but doesn't matter for anything.
6:00 PM May 24th
 
MarisFan61
Bill: Upon further review... :-)

I wasn't wrong in thinking you didn't include reached-on-error in ON-BASE-AVERAGE, and so, I think that what Evanecurb and I said about it does apply.

(It was included in the study as a separate consideration.)
5:56 PM May 24th
 
MarisFan61
Comments:

A thing I realized while I was doing this:
I'm afraid this will seem to some (especially Bill) :-) like eye-rolling semantic irrelevance, but:
Bill didn't really study what Owen asked.

There's an important difference between:
(1) "fast"/"slow," which was how Owen asked it, and
(2) relatively faster or slower than each other in pairs of players, which is what Bill looked at.

This didn't hit me until I did that "sub-study" and therefore was thinking of it in a more detailed way. The first thing was when I got to Chet Lemon and Andre Dawson, in which Lemon showed as the "slow" guy.
First I just wondered, how correct is it to consider Lemon slower than Dawson, but quickly I alit [spell-check says this isn't a word???] on the more basic thing:
Chet Lemon was SLOW??

Of course he wasn't. And that kind of thing applies to a few others of those known pairings, and must have applied to many many others of the thousands that were used in this study.

Looking at groups of comparable players who were relatively slower or faster than each other is completely different from looking at groups of "slow" and "fast" players.
I don't know that these results don't reflect on what Owen asked, but I do know that what was looked at in the study is completely different from what he asked.

-----------------

That's related, sort of, to the thing I raised about why Bill allowed the difference in on-base-average between a pair of players to be as small as .010. (I suggested requiring it to be at least maybe .020, because a difference as small as .010 is basically just "noise.")

I thought we'd be looking for a QUALITATIVE difference between the players in each of the two categories (on-base and speed), not just a quantitative one, and I didn't think a difference of .010 is enough to mean that one player 'really' was more of an on-base guy than the other guy.

In that thing as well as the speed thing, I see the differential we're looking for as not just a numerical thing but a CONCEPTUAL thing, regarding the characteristics of the players.
But, be that as it may, what I said here on the "speed" question is a clearer thing: What Bill looked at is simply not the same as what Owen said, or even similar; and maybe it still reflects on the question Owen asked but I can't tell if it does.

====================

Separate thing, completely unrelated, but related to what Evanecurb and I brought up, and to Bill's good choice to include reached-on-error:

Shouldn't "reached-on-error" be included in On-Base-Average?
Why the **** isn't it?
Pardon the obscenity :-) but, why isn't it???
It has probably been discussed here before -- I can't imagine it hasn't been -- but I don't recall it.

I cannot think of any GOOD reason that it isn't included.
(I can think of bad ones.)

Memo to the field of sabermetrics: It's not too late. :-)
Start including "ROE" in on-base-average, including retroactively.
5:46 PM May 24th
 
MarisFan61
Bill: Sorry! Indeed I missed that. Good of course that you included it!

-----------

Since (1) I did do the little 'sub-study, and (2) thought about it further, I'll show the results, then a couple of comments.


First, if I were wearing a hat, arguably I'd have to eat it, because while the results were in the direction I said, they weren't pronounced enough for me to celebrate. Judging this in the way I usually would (with no dog in the fight), I'd call it no-decision.

Here are the details.
I used these pairs, which are all the ones that Bill mentioned, omitting the ones he said were bad (Rice-Lowrey, Kaline-Vander Wal) and the one from early in the 20th century (Witt-Milan) because I didn't find "ROE" data for then.

Reed-Garcia
Lemon-Dawson
Ordonez-Mondesi
Clark-Callison
Maris-Kirkland
Dw. Evans-Bobby Bonds
Elliott-Traynor
Choi-Pena
Ordonez-Herman

The faster guy had a higher ROE average per plate appearance for the year in 5 of the 9 pairs, which is nothing.

The overall ROE rates for the two "groups," i.e. 'slower' group and 'faster' group, showed more impressively but still not enough to say anything.
These are [i]the averages of the player-averages[/i[] in each 'group,' not the averages from gross summings of ROE's and PA's of all the players.

"Slower' group: average of 1.15 reached-on-error per 100 plate appearances

"Faster' group: average of 0.96 reached-on-error per 100 plate appearances

(I'll hold onto the detailed data for a while in case anyone wants to ask about it.)


Further comments in separate post.....
5:12 PM May 24th
 
bjames
Responding to John Rickert's question. . .yes, there is the risk of slicing the data so thinly that it can't carry the study, but. . . I'll look at it. Sounds like it might work. I'll have to put a tight control on the year.
4:43 PM May 24th
 
bjames
Would it be reasonable to use this data to match fast and slow players with the same on-base percentage and see if there is a noticeable spped advantage? And if so, to look at it for several on-base percentages such as .300,.325,.350,.375,.400 to see if such an advantage would increase or decrease as the players get on base more often? Or if it disappears at some levels and is present at others? That does run the risk of splitting the cohorts too finely.
4:42 PM May 24th
 
bjames
Evanecurb's point is important.
Other things equal, faster players DO reach-on-error more.
(We've looked at that a few times.)

If that were added to this mix, it would reduce the difference further toward 0 -- probably not all the way.


That WAS considered in the study. Read the study more carefully; that WAS factored in.
4:41 PM May 24th
 
MarisFan61
....doing a "study within the study" (using Bill's pairs) to check on it.
I'll eat my hat if it doesn't show what Evanecurb and I said.
(Easy for me to say, I'm not wearing one.)
4:02 PM May 24th
 
MarisFan61
Evanecurb's point is important.
Other things equal, faster players DO reach-on-error more.
(We've looked at that a few times.)

If that were added to this mix, it would reduce the difference further toward 0 -- probably not all the way.
1:50 PM May 24th
 
frisco
Haven't really delved into the article but love the Bee Gees reference at the end.

My Best-Carey​
12:35 PM May 24th
 
willibphx
Totally away from the topic at hand but the data and resulting analysis looks like it could be used to study aging patterns as well which could be very interesting.
6:42 AM May 24th
 
OwenH
To shthar's question: Fast players, if they have more range in the field, will get to more chances, thus giving them more opportunities to commit errors -- especially since the extra chances they get to would often be the most difficult plays -- and they might get charged for some more errors for missing those plays, even if a slower player would have let the ball by for a hit ...right? It seems logical, and I feel like I've read about this before. Would make the most sense for outfielders who have to run farther to get to balls, and have the most need for speed on defense.
12:02 AM May 24th
 
shthar
But do fast players make more errors than slow players?


10:19 PM May 23rd
 
evanecurb
I have a belief that fast players reach base on errors more often than slow players. If true, the difference in OBP is less than .25.
10:07 PM May 23rd
 
jrickert
Would it be reasonable to use this data to match fast and slow players with the same on-base percentage and see if there is a noticeable spped advantage? And if so, to look at it for several on-base percentages such as .300,.325,.350,.375,.400 to see if such an advantage would increase or decrease as the players get on base more often? Or if it disappears at some levels and is present at others? That does run the risk of splitting the cohorts too finely.
9:00 PM May 23rd
 
MarisFan61
Cool!
My main take on what the results mean:
Owen did a great job on specifying criteria by which this is a real question, i.e. it's very hard to give a definitive answer, because it's very close to equal.
4:01 PM May 23rd
 
 
©2020 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy