The 96 Families of Hitters

August 27, 2008

            Those of you who have read my stuff over the years know that I have long been looking for some way to classify hitters into “families”.   I’ve tried a couple of other things to do this, but I have another idea here, and I think maybe this is it.

                        Each player has a unique, but not totally unique, ratio of doubles to triples to homers.   The most common extra base hit ratio is what I call a 6-1-3 ratio—about twice as many doubles as homers, and about three times as many homers as triples.   30 doubles, 5 triples, 15 homers.     Amos Otis, for example, had 374 doubles in his career, 193 home runs—about a 2-1 ratio—and had 66 triples.  That’s a 6-1-3 ratio (374-66-193).    Jay Bell had a 6-1-3 ratio (394-67-195), and Roy White (300-51-160), and Kirby Puckett (414-57-207).   It’s a common ratio.               

                        Over the last couple of years I have gotten into the habit, without really thinking it through, of categorizing players by this ratio. . ..I look at Paul Konerko and say, “Oh, that’s a 5-0-5 pattern”, or whatever.   It occurred to me recently that perhaps I could make this into a formal categorization process.

                        But first, the obligatory “Why are we doing this?” paragraphs.

                        Whenever you can identify players who have two things or three things or four things in common, it is extremely likely that they will also have other things in common.  If you identify the most-similar player to a player born in California, very often you will find another player born in California.   If you identify the most-similar player to a 6-foot-5-inch left-handed pitcher, very often you will find another 6-foot-5-inch lefty.   If you identify the most similar hitter to someone from the 1960s, very often you find someone else from the 1960s. 

                        Thus, if you can identify “families” of hitters, it is possible that you can learn things that you didn’t know by generalizing about the characteristics of the family.   If half the members of a family ground into a lot of double plays and you have no GIDP data about the other half, it is likely that they also will have grounded into a good many double plays.   If you observe that some members of a family age well, you might be able to predict that other members of the family, still in mid-career, will also age well.   If you find that some members of a family have difficulty making the jump from A ball to AA, you might anticipate that other members of the family might have the same problem.   If you notice that some members of a family are prone to a particular type of injury, you might want to do preventive training for other members of the family.   If one member of a family is arrested after being caught with a 14-year-old in his room, you might not wish to generalize from that, or even bring it up at the family re-union.     

                        Classification is central to the process of observation.   This is true in psychology, in chemistry, in zoology and quality control.  It is true in sabermetrics.  For years I have looked for some way to classify hitters.   Now I think that I have it.

 

                        OK, let’s move on.  The first thing we have to do is to figure, for each ten extra base hits that each player has, how many are doubles, how many are triples, how many are home runs—rounded off to the nearest integer, of course.   Well, sometimes you can’t round it off to the nearest integer and get ten.   Sometimes you get eleven, and sometimes you get nine.   Harmon Killebrew had 33% doubles, 3% triples, and 64.6% home runs.   If you round that off you get 3-0-6, whereas for Hal McRae you get 7-1-3, and for Joe Rudi, 6-1-4.  

                        We don’t want some players rounding off to nine or eleven, because we’re trying to put hitters into groups with similar hitters, and these nines and elevens tend to take players out of the groups.   The next thing we have to do, then, is to round the numbers off so that they add up to ten.   I would explain how we do this if I had any reason to believe that you would have trouble figuring it out on your own.

                        What if a player has 140 doubles, 30 triples, 30 homers?   That comes out to 7-1.500-1.500.   How do you break the tie?

                        In case of ties I assigned the overage to the lower category—that is, to triples rather than homers, and to doubles rather than triples or homers.   This player would be classified a 7-2-1.  

                        OK, we’ve got everybody’s extra base hits now grouped into a “ten-pattern”.   In theory there are 55 possible ten-patterns.  However, of those 55 possible ten-patterns, there are only 26 which have actually occurred, among players with 1000 or more plate appearances.   The 26 which have actually occurred are:

                        9-1-0

                        9-0-1

                       

                        8-2-0

                        8-1-1

                        8-0-2

 

                        7-3-0

                        7-2-1

                        7-1-2

                        7-0-3

 

                        6-4-0

                        6-3-1

                        6-2-2

                        6-1-3

                        6-0-4

 

                        5-4-1

                        5-3-2

                        5-2-3

                        5-1-4

                        5-0-5

 

                        4-5-1

                        4-4-2

                        4-2-4

                        4-1-5

                        4-0-6

 

                        3-1-6

                        3-0-7

 

                        The only player in history who was 4-5-1 was Joe Visner (1885-1891), while the only two players who were 4-4-2 were Lew Whistler (1890-1893) and Jeff Stone (1983-1990).  All of them did it in careers that barely cleared the 1000-plate appearance mark.   These are not ratios that a player would likely sustain over a longer career.

                       

                        OK, so we can sort players into those 26 categories, and, generally speaking, we can observe that some players within those categories do look very much alike.   There is an obvious problem.     Alex Rodriguez is 4-0-6, and so is Russell Branyan.   Can one really say that Russell Branyan and Alex Rodriguez are members of the same family of hitters?

                        Of course not.  Stan Musial and Craig Paquette are both 5-1-4s.    We have to do something to recognize quality of contribution distinctions.  After experimenting with different ways to slice the loaf, I decided that the most obvious one worked the best:  OPS.  

                        Players with an OPS over .900 I marked “A”.

                        Players with an OPS of .8334 to .8999 I marked “B”.

                        Players with an OPS of .7667 to .8333 I marked “C”.

                        Players with an OPS of .7000 to .7666 I marked “D”.

                        Players with an OPS of .6334 to .6999 I marked “E”.

                        Players with an OPS of .5667 to .6333 I marked “F”.

                        Players with an OPS below .5666 I marked “G”.

 

                        Although many of the “Fs” and “Gs” disappeared later in the process, since there were rarely enough of them to form families.

            Anyway, with his very high OPS Alex Rodriguez was now classified as 406 A, while Russell Branyan was classified as 406 C.                  This would have given us, in theory, 182 “families” of players—26 times 7—except that about 40% of those were empty sets.  

 

The goal of the system is to put hitters into groups with similar hitters.   The only two players in history who have a ten-pattern of 3-0-7 are Mark McGwire and Harmon Killebrew.    McGwire, however, has an OPS over .900, which makes him 307 A, and Killebrew has an OPS just below .900, which makes him 307 B.  

                        Obviously Killebrew and McGwire are similar hitters and belong in the same family, but that only makes a family of two.   Our goal in this particular exercise is not to celebrate the uniqueness of Mark McGwire, but to place him in a group with those hitters to whom he is most similar.  There are other hitters who are fairly similar—Sammy Sosa, Hank Sauer, Rocky Colavito, Frank Howard, etc.—so why don’t we put him where they are?  Those guys are in group 406 B, so I moved Killebrew and McGwire to group 406 B.

                        The rule I made up in my head was that each player should be in a family of at least ten players; no groups of less than ten.   In practice, I had many, many players who were in groups of eight and nine, and only a manageable number in groups of seven or less, so I adjusted the rule to families of at least eight. 

                        In a few cases these “forced classifications” were debatable.    There are only six players in history who have a “natural code” of 406 A, so I tried to group the 406 As with the 406 Bs, and, since there were more 406 Bs than 406 As, I called them all 406 B.  But this put Alex Rodriguez (naturally a 406 A) in the same family of hitters with Harmon Killebrew (naturally a 307 B), which. . ..A-Rod and Killer are both great hitters, but not really that similar.   I wound up moving two of the 406 As (A-Rod and Ken Griffey Jr.) into group 415 A, a group which also includes Mel Ott, Willie Mays, Hank Aaron, Babe Ruth and Barry Bonds, and moved the other four 406 As (Jim Thome, Adam Dunn, Ryan Howard and Mike Piazza) into group 406 B, with Killebrew, McGwire and Rocky Colavito. 

                        It’s not a perfect process, and if you can find some better way to make those hard borderline sorts, well, that’s what separates knowledge from BS; knowledge is something you can build on.  Anyway, my next problem was large, sloppy groups.   Large families become ragged and ill-defined.   The largest family remaining at this point was 613 D, a group which included 173 players ranging from Khalil Greene to Don Mueller.    Don Mueller was a 1950s outfielder who never struck out and would typically hit around .300 with less than ten homers a year.   How does he wind up in a group with Khalil Greene?

                        He doesn’t belong with him, of course, and I decided to break up the large families into groups of 80 or less. . ..actually 81 or less, since the one family which had 81 seems fairly cohesive, and I didn’t see that I needed to split it up.   I decided to further divide the too-large families by the ratio of base hits to secondary bases, secondary bases being defined as extra bases on hits, plus walks, plus stolen bases.    I divided 613 D into three groups—613 D1, which contains Khalil Greene along with players like Gary Redus, Oddibe McDowell, Lee Mazzilli, Lloyd Moseby, Tony Phillips, Tommy Harper, Vince DiMaggio, Ken Henderson and Devon White, 613 D2, which contains players like Emil Brown, Ed Charles, Gabe Kapler, Jose Cardenal and Claudell Washington, and 613 D3, which contains guys who hit .300 sometimes like Don Mueller, Jimmie Piersall, Bobby Avila, Cleon Jones and Vic Power.   613 D1 has an average batting average of .254 but with a good many strikeouts and walks and a few homers, while 613 D3 has an average batting average of .274 with few walks, fewer strikeouts, fewer stolen bases and slightly fewer home runs.  

                        Of course, smaller groups also tend to become ragged sometimes; that is, smaller families of players sometimes contain players that one would not tend to think of as similar.    One more problem.   As I was writing up the system, explaining who goes where, I became aware that the exceptionally high-walk players, like Ed Yost, Gene Tenace and Mickey Tettleton, stuck out like sore thumbs in this system as I had established it.   They just didn’t fit; their exceptional high-walk totals caused them to be classified, given the idiosyncrasies of this system, with players whose batting averages were 50 points higher and who also had more power.   I had to create two special groups, ending with a “W” code, to put some of the high-walk players into families together.

 

In a perfect categorization system,

a)      players who were very similar would always be in the same family, and

b)      players who were not similar would never be in the same family.

This isn’t a perfect system, and it doesn’t meet either of those criteria perfectly.  Because it relies on drawing lines between groups of players, and because those lines have to be arbitrarily determined, it does sometimes separate players who seem very similar.   Because it doesn’t look at every facet of the player’s skills, it sometimes puts together players who seem very dissimilar.   It isn’t a perfect system.   But I have been wanting to have a way to classify hitters into families for a long time, and this is the best thing I’ve come up with.

                        I wound up with 96 families of hitters, which include all non-pitchers in history with 1000 or more plate appearances.  

 

The 96 Families

                        In a separate article (Appendix to 96 Families), I’ll list all players in history and the family they are in—in fact, I’ll list them twice, once by the family and once alphabetically.   But here I wanted to offer a few more general observations about the families of hitters.

                        The system is intended to be sort of intuitive.  Who would be in family 811 C, for example? 

                        Well, let’s see. … 811 means singles hitters, obviously, since the home run number is very low—eight doubles for each homer.   That means five homers a season or less.  C indicates hitters who have an OPS around .800, and .800 is a high OPS for a singles hitter, so 811 C would be high-average singles hitters.  

                        Freddy Sanchez is 811 C—the only contemporary player who is 811 C.   Billy Herman was 811 C, and Lou Boudreau, Joe Sewell.    Wade Boggs is in group 811 C; Boggs has a natural code of 811 B, but is the only player in history who would be 811 B, so the family 811 B were his nearest relatives.   No one in history is or would naturally be 811 A.   Singles hitters don’t have an OPS over .900.   So Freddy Sanchez is classed in a family with Billy Herman, Lou Boudreau, Joe Sewell, Wade Boggs and a handful of other guys; it’s a small family, eight players. 

                        721 C are the same type of guys, only they run well enough to hit a few more triples (Rod Carew, Frankie Frisch, Lyman Bostock, Ben Chapman), and 721 B are the similar hitters only with even higher averages or a bit more power (Paul Waner, Nap Lajoie, Pete Browning, Heinie Manush.)   Tris Speaker and Ed Delahanty, who would naturally be classified 721 A, are included in 721 B because they would otherwise be a family of two.

                        The chart below summarizes the 96 families.   The players listed here are generally the best players in the family, or the most recognizable names, and they tend to be a little better than the center of the group.   Each group represents a range of ability; these players represent the top end of the range.   I was trying to use the most recognizable names to characterize the skills of the group, and the most recognizable are more recognizable because they were better.

                        For a lot of the families, a player might be a regular if he was an infielder or a catcher, but a part-time player if he was an outfielder.  Thus, group 514 D2 includes Gary Gaetti, Bill Freehan, Earl Battey and Benito Santiago—regulars and even minor stars—but also includes many guys who were about the same as hitters, but weren’t true regulars because they were outfielders (Rip Repulski, Pedro Feliz, Jerry Martin, Glenn Braggs, Pedro Munoz, Matt Mieske, Carmelo Castillo.)    I’m just pointing this out to remind you that it is a classification of HITTERS, not PLAYERS. 

                        I’ll make notes here about the families that had an OPS over .800.   More understanding of the families can be garnered from studying the rosters in the companion article, Appendix to 96 families.   The categories below are the code of the family (Family), the number of players in the family (#), the number of players in the family who are in the Hall of Fame (Hall), the average batting average for the players in the family (B Avg), the average On Base Percentage of the family members (OBA), and the average Slugging Percentage (Slug).   And a few examples of the most prominent players who are in the family.  

Family

#

Hall

B Avg

OBA

Slug

Leading Players

415 A

14

10

.300

.402

.566

Hank Aaron, Willie Mays, Frank Robinson, Babe Ruth

The only players in this family who are not in the Hall of Fame are Barry Bonds, A-Rod, Ken Griffey Jr. and

Dick Allen.  The family consists primarily of 50s, 60s and 70s outfielders, but also includes Mel Ott, Jimmie Foxx,

and Mike Schmidt.  

  

  

  

  

  

  

  

  

  

  

  

514 A

13

7

.318

.404

.557

Stan Musial, Lou Gehrig, Joe DiMaggio, Larry Walker

This group is dominated by 1930s superstars, including Hack Wilson and Hank Greenberg.   Larry Walker and

Brian Giles have joined the group in recent years.

  

  

  

  

  

  

  

  

505 A

19

1

.300

.397

.559

Ted Williams, Gary Sheffield, Frank Thomas

The reason these players aren't in the Hall of Fame is that they are almost all active or recently retired.

Bagwell, Chipper, Vladimir, Juan Gone and Jim Edmonds are in this group, as is Manny Ramirez.   Although

we all know that Manny is truly one of a kind.

  

  

  

  

  

  

  

406 B

17

2

.271

.368

.517

Sammy Sosa, Willie McCovey, Harmon Killebrew

The "0" in the triples column, plus more homers than doubles, usually indicates a slower player.   These are

the slugging first basemen and corner outfielders--Colavito, Hank Sauer, Jay Buhner, Jim Gentile

  

  

  

  

  

  

  

721 B

8

6

.338

.401

.477

Tris Speaker, Nap Lajoie, Paul Waner, Ed Delahanty

This group begins with Cap Anson and ends with Paul Waner. 

  

  

  

  

  

  

  

631 B

14

10

.335

.412

.464

Ty Cobb, Honus Wagner, Eddie Collins, Jesse Burkett

The last player in this group was Earle Combs, retired 1935.    The non-Hall of Famers in the family are

Joe Jackson, Tip O'Neill, Dave Orr and John McGraw, who of course is in the Hall of Fame as a manager, and

was a great player as well.

  

  

  

  

  

  

  

  

  

  

523 B

21

9

.311

.385

.489

Roberto Clemente, Rogers Hornsby, Al Simmons

Curtis Granderson is in this group as of now, which, if he stays there, would make him the first player in the

family since Clemente.    But that grouping could change as Granderson ages and slows down, leading to

fewer triples.

  

  

  

  

  

  

  

  

  

  

  

  

604 B

22

1

.295

.374

.497

Carl Yastrzemski, Jeff Kent, Will Clark

Many young players in this family now--David Wright, Garrett Atkins, Miguel Cabrera, Victor Martinez.   Also

many active veterans.

  

  

  

  

  

  

  

  

  

  

  

712 B

8

2

.322

.392

.474

Tony Gwynn, Mickey Cochrane, Riggs Stephenson

Surprisingly, despite the high average, this group has mostly players who were short-term regulars--Dale

Alexander, Earl Webb, Ike Boone, Babe Phelps.   It is not a cohesive family, and I wonder whether I

should have broken it up and assigned the players elsewhere.

  

  

  

  

  

  

  

613 B

24

7

.306

.379

.485

George Brett, Luis Gonzalez, Joe Medwick, Derek Jeter

Bernie Williams, Minnie Minoso, Babe Herman, Bobby Abreu, Nomar Garciaparra, Jackie Robinson, Robinson

Cano, Hanley Ramirez

  

  

  

  

  

  

  

  

  

  

  

415 B

10

3

.273

.364

.499

Reggie Jackson, Eddie Mathews, Willie Stargell

Modern slugging outfielders, mostly. . ..Eric Davis and his BFF Daryl Strawberry are both here.

  

  

  

  

  

  

  

514 B

31

2

.286

.370

.492

Al Kaline, Billy Williams, Dwight Evans, Moises Alou

Brad Hawpe, J. D. Drew, Jason Bay and Carlos Beltran are in this family.

  

  

  

  

  

  

  

622 B

19

9

.317

.392

.469

Charlie Gehringer, George Sisler, Enos Slaughter

1920s Hall of Famers.   Enos Slaughter is the only member of this group to have played since Charlie Gehringer

retired.

  

  

  

  

  

  

  

  

  

  

  

  

  

505 B

25

2

.277

.362

.493

Eddie Murray, Fred McGriff, Orlando Cepeda, Aramis

Justin Morneau, Pat Burrell and Ryan Klesko.

  

  

  

  

  

  

  

  

703 C

16

0

.287

.362

.444

Craig Biggio, John Olerud, Don Mattingly, Sean Casey

  

  

  

  

  

  

  

415 C

36

2

.263

.340

.463

Ernie Banks, Yogi Berra, Willie Horton

In this case the superstars (Banks and Berra) are very atypical of the group.  The core of this family is .280

hitters with 30 homers a year--Joe Adock, Roy Sievers, the 1950s Frank Thomas, John Mayberry, Jim Lemon,

Jesse Barfield, Bob Allison, Wally Post, Dick Stuart, Jim Ray Hart, Tony Conigliaro, Nate Colbert, Bob Cerv

  

  

  

  

  

  

  

514 C1

51

1

.264

.351

.449

Dave Winfield, Chili Davis, Ron Santo, Joe Carter

          Short essay.   It is odd to see Ron Santo, with a good on base percentage (.362), listed here with Joe Carter,

who would swing at anything.    They are otherwise similar--.270, 30 homers, 110 RBI; that's Carter or Santo.

I have tried before to group players into families based on strikeout and walk frequencies, but that doesn't

work, either; that leads to unlike hitters being grouped together because they have similar strikeout/walk

ratios.    If you sort players on BOTH the extra base hit ratio and the strikeout/walk ratio, you wind up with

a lot of families of three players.    There just doesn't seem to be a perfect way to do it.     

  

  

  

  

  

  

  

406 C

10

0

.247

.330

.470

Boog Powell, Dave Kingman, Cecil Fielder

Leon Wagner, Gus Zernial, Gorman Thomas, Ron Kittle.

  

Family

#

Hall

B Avg

OBA

Slug

Leading Players

505 C

47

2

.262

.340

.459

Don Baylor, Gary Carter, Johnny Bench, Jermaine Dye

604 C

61

2

.274

.346

.451

Cal Ripken, Steve Garvey, Paul O'Neill, Ernie Lombardi

622 C

45

4

.295

.362

.435

Zack Wheat, Tim Raines, Kenny Lofton, Ichiro Suzuki

613 C2

48

4

.292

.351

.445

Paul Molitor, Robin Yount, Vada Pinson, Al Oliver

532 C

9

0

.295

.352

.442

Harry Stovey, Buck Freeman, Carl Crawford

811 C

8

4

.306

.376

.418

Wade Boggs, Billy Herman, Joe Sewell, Lou Boudreau

721 C

42

7

.303

.373

.421

Rod Carew, Frankie Frisch, Richie Ashburn

631 C

28

5

.301

.372

.421

Willie Keeler, Pie Traynor, Edd Rousch, Jake Beckley

514 C2

52

3

.277

.341

.451

Harold Baines, Andre Dawson, Tony Perez, Carlton Fisk

613 C1

48

2

.275

.358

.435

Rickey Henderson, Roberto Alomar, Joe Morgan

523 C

11

0

.277

.356

.436

Andy Van Slyke, Wally Moon, Bob Skinner

712 C

33

1

.294

.370

.422

Pete Rose, Mark Grace, Keith Hernandez, George Kell

514 W

19

0

.245

.358

.429

Dick McAuliffe, Darren Daulton, Brad Wilkerson

541 D

21

2

.282

.349

.396

Sam Crawford, Buck Ewing, Lance Johnson

604 D

69

0

.256

.323

.416

Tim Wallach, Larry Parrish, Bret Boone, Terry Steinbach

613 D1

55

0

.255

.341

.398

Tony Phillips, Jay Bell, Toby Harrah, Devon White

820 D

8

0

.294

.358

.378

Charlie Jamieson, Johnny Pesky, Bucky Harris

703 D

37

0

.270

.335

.401

Gregg Jefferies, Carlos Baerga, Wil Cordero

532 D

14

0

.276

.334

.402

Owen Wilson, Casey Stengel, Deion Sanders

514 D2

41

0

.261

.315

.420

Gary Gaetti, Benito Santiago, Juan Encarnacion

514 D1

40

0

.248

.333

.402

Ruben Sierra, Sal Bando, Davey Lopes

730 D

17

2

.293

.358

.376

Sam Rice, Hughie Jennings, Juan Pierre,

622 D

71

2

.275

.341

.392

Lou Brock, Pee Wee Reese, Mickey Rivers, Ralph Garr

613 D2

58

0

.265

.329

.403

Alan Trammell, Marquis Grissom, Joe Rudi

712 D2

61

1

.282

.336

.397

Bill Buckner, Red Schoendienst, Tony Fernandez

631 D1

43

2

.274

.356

.376

Harry Hooper, Brett Butler, Bid McPhee

631 D2

46

1

.290

.345

.387

Lloyd Waner, Willie Wilson, Hal Chase, Manny Mota

721 D1

62

6

.279

.357

.373

Max Carey, Phil Rizzuto, Willie Randolph, Junior Gilliam

523 D

24

0

.263

.330

.399

Willie Davis, Jim Fregosi, Juan Samuel

406 D

12

0

.229

.312

.417

Rob Deer, Steve Balboni, Dave Duncan

613 D3

58

1

.274

.326

.403

Brooks Robinson, Buddy Bell, Carney Lansford

505 D

43

0

.243

.315

.413

Graig Nettles, Lance Parrish, Tony Batista, Joe Crede

721 D2

62

2

.289

.341

.385

Nellie Fox, Jimmie Collins, Willie McGee, Matty Alou

712 D1

62

0

.267

.342

.383

Jimmie Dykes, Edgar Renteria, Orlando Cabrera

811 D

33

1

.286

.350

.375

Dick Bartell, Rick Ferrell, Billy Goodman, Johnny Ray

415 D

34

0

.240

.311

.409

Tony Armas, Joe Pepitone, Woodie Held

721 W

14

0

.253

.381

.328

Eddie Yost, Eddie Stanky, Max Bishop

640 E

9

0

.279

.344

.364

Harry Bay, John Coleman, Tim O'Rourke

802 E

18

0

.263

.330

.352

Ken Reitz, Dave Magadan, Rich Dauer, Brent Mayne

613 E1

59

0

.241

.312

.361

Paul Blair, Jim Sundberg, John Roseboro, Alan Ashby

910 E

9

0

.273

.339

.334

Muddy Ruel, Johnny Bassler, Pinky May

721 E2

57

2

.265

.322

.351

Luis Aparicio, Bobby Wallace, Billy Jurges

721 E1

53

1

.259

.335

.337

Roger Peckinpaugh, Johnny Evers, Harold Reynolds

514 E

48

0

.239

.299

.373

Clete Boyer, Jim Spencer, Steve Yeager, Craig Paquette

712 E1

53

0

.248

.325

.347

Royce Clayton, Brad Ausmus, Chris Speier

622 E

67

0

.259

.312

.359

Bert Campaneris, Tony Taylor, Bill Virdon, Neifi Perez

613 E2

59

1

.254

.304

.366

Bill Mazeroski, Frank White, Leo Cardenas, Angel Berroa

703 E

34

0

.246

.305

.364

Terry Kennedy, Dan Wilson, Rick Dempsey, Pat Borders

721 E3

57

0

.273

.317

.351

Garry Templeton, Bill Russell, Dave Cash

631 E

81

2

.263

.320

.347

Maury Wills, Joe Tinker, Monte Ward, Omar Moreno

811 E

77

1

.266

.325

.342

Omar Vizquel, Ozzie Smith, Dick Groat

712 E3

54

0

.263

.310

.356

Tony Pena, Enos Cabell, Manny Trillo, Tony Kubek

712 E2

54

0

.255

.314

.352

Dave Concepcion, Steve Sax, Bob Boone

604 E

22

0

.236

.298

.368

Bo Diaz, Joe Oliver, Dave Valle

730 E

47

1

.266

.327

.337

Rabbit Maranville, Eddie Foster, Shano Collins

541 E

19

0

.257

.320

.344

Harry Lord, John Hummel, Greasy Neale

523 E

20

0

.238

.300

.357

Roy Smalley Sr., Jake Wood, Larry Stahl

820 E

49

1

.263

.327

.329

Milt Stock, Sparky Adams, Luke Sewell (Ray Schalk)

532 E

11

0

.238

.300

.340

Tom Brown, Fred Pfeffer, Dick Johnston

901 E

8

0

.260

.315

.324

Jody Reed, Marty Barrett, Mike Redmond

730 F

39

0

.249

.306

.305

Larry Bowa, Bud Harrelson, Kid Gleason

721 F1

49

0

.238

.305

.305

Freddie Patek, Julio Cruz, Jose Uribe

721 F2

49

0

.248

.291

.315

Don Kessinger, Ozzie Guillen, Alfredo Griffin

820 F

49

0

.248

.301

.304

George Stovall, Tommy Thevenow, Frank Taveras

811 F

49

0

.244

.297

.307

Tim Foli, Mark Belanger, Horace Clarke

712 F

61

0

.234

.292

.312

Ed Brinkman, Bucky Dent, Dick Schofield Sr.

802 F

12

0

.237

.296

.307

Bob Swift, Glenn Hoffman, Johnny Oates

613 F

36

0

.225

.280

.322

Aurelio Rodriguez, Bobby Knoop, Phil Roof

703 F

14

0

.222

.280

.319

Buck Martinez, Matt Walbeck, Bob Melvin

631 F

59

0

.237

.289

.309

Tommy Corocoran, Bones Ely, Wid Conroy

622 F

19

0

.229

.288

.309

George Strickland, Dave Nelson, Herm Winningham

541 F

22

0

.228

.272

.304

Pop Corkhill, Bill Kuehne, Billy Maloney

640 F

14

0

.234

.277

.293

Roger Metzger,  Rodney Scott, Sadie Houck

910 F

21

0

.239

.284

.284

Skeeter Newsome, Felix Fermin, John Peters

721 G

22

0

.220

.261

.278

Doug Flynn, Dal Maxvill, Al Weis

811 G

18

0

.220

.265

.269

Hal Lanier, Tim Cullen, Jeff Torborg

730 G

15

0

.220

.261

.271

George McBride, Joe Gerhardt, Bill Bergen

820 G

30

0

.221

.263

.263

Lee Tannehill, Bill Killefer, Charley O'Leary

Post Script:

                        After filing this article three ways occurred to me that I could have done this better.   First, I should have added to the definition of a “family” that a family of players should have been defined by as many shared characteristics as can be identified for a suitable group.      Second, consistent with that change, I should have changed the parameters of a family from eight to eighty players to eight to twenty players—thus adding additional identifying patterns to the larger groups to break them down into tighter “families”.    In other words, take the 49 players in 811 F and break them down into three groups based on strikeout/walk ratios and/or speed, thus making more compact, more unified families.  

                        Third, I should have “named:” the families after their most prominent players, as I have done informally before, rather than allowing the families to be defined by codes.   If I had identified group 703 F as the Buck Martinez/Bob Melvin family, for example, the average fan would have been much more able to relate to that, more able to understand what I was doing.  

                        If I had made these changes it would have taken me another two or three days to finish the research.   I didn’t do these things, I think, in part because I hadn’t posted anything for a long time, and I was feeling some pressure to get something up.  That’s not a good reason, but I think that’s what happened.  

 
 
 

COMMENTS (11 Comments, most recent shown first)

PeteRidges
I think there are 66 possible ten-patterns, not 55.

I was disappointed to see that there are no 10-0-0 players, so I'll mention Jo-Jo Morrissey (1932-36): only 897 PA, but 31 doubles, one triple and zero home runs. If you're too slow to triple, and you've no power, it's not easy to stretch a single into a double.
8:32 AM Sep 1st
 
barronmo
I wonder if the size of a family might tell us something? For example, if a player comes along and joins a small family could we make certain predictions about the future. If someone shows a rare set of skills, perhaps its less likely they would lose those skills? Or maybe the opposite is true and those who initially join small groups tend to fall out of those groups into larger ones in predictable ways. I'm sure this is one of your reasons for doing the grouping; how players change groups with time would likely be pretty interesting.
12:34 AM Aug 31st
 
jollydodger
Thank you for feeling compelled to get something posted. Its nice to know that you feel some urgency to produce something for us.
8:28 PM Aug 29th
 
bgorden
This is really interesting. It's both a sidelight to baseball history and a recognition of player's skills in a whole new way. I've long been interested in the family of players whose main offensive ability is the ability to draw walks: John McGraw, Miller Huggins, Jimmy Sheckard, Max Bishop, Eddie Yost, Eddie Stanky,etc.

It's significant that a lot of these guys were 2nd basemen and/or became managers.
I live in Portland OR and the local AAA team has a 2nd baseman named Matt Antonelli who is hitting .215 (he was on the interstate most of the season)but he has 75 walks, which has kept him in the lineup.
2:38 AM Aug 29th
 
ksclacktc
Bill,

Another way that might have possibilities is to look at a ratio of "old" bases to "young" bases. You have outlined a method for this somewhere else. I was fooling around with this a little and I like the results. I figured the ratio for all players since 1900 with 1000 TPA. I then sorted them into 10 equal groups based on percentiles. If you look at the list of Top 100 SS from the BJHBA you get these results:

10- Wills, Campaneris, Tinker
9- O.Smith, Aparicio, Maranville
8- Fernandez, Concepcion
7- Yount, Larkin
6- Vaughan, Trammell, Jeter
5- Cronin, Reese , Boudreau, Pesky
4- Fregosi, J.Bell, Logan
3-
2- Ripken, Banks, ARod, Stephens
1- Petrocelli

There may be a way to tweak this further, but I have to get going. Interesting to not: The guys at the low end the 1s&2s are guys who may been better off as 3B, played a lot of 3B or at least where not a classic type of SS and where better defensively than people thought. Also, Wagner as a deadballer fits in at the top of the spectrum with the Wills and Campaneris types. Not sure what to make of that.

Dave
1:52 PM Aug 28th
 
tbell
Love it.

I do think the breakdown by hitter quality is too narrow (has too many categories). A hitting style (style of output rather than swing style) is what it is, regardless of hitting quality. One of the fascinations of this project is to group players one would not otherwise think of linking together. The world is full enough of ratings of hitting quality/value; it would be nice for this classification of hit output style to be a little less defined by overall hitter quality.

Reducing the quality subcategories would still provide useful information. For example, I'd bet all 3-0-7 hitters are strong candidates to hit into a double play, whether they are 3-0-7 A or 3-0-7 G.

So I would rather see a more gross classification by quality … perhaps just three levels (A-B-C). If this produces groups too large to be useful to a researcher, I'd rather see further breakdowns by the property being classified: style of batting output. In other words, strikeouts, walks, groundball/flyball ratio, GIDP, etc.

Also, I don't quite understand the wish to bend/conflate categories so they are of broadly similar size. An effective classification system should recognize unique entities as unique. If your system convincingly identifies a hitter as unique – the system has worked. But you have treated such instances as indications for system repair.

For me, this waters down the system, and diminishes its interest. On the other hand, I am not a baseball researcher, and don't have any interest in using these groups as the basis for further research.
11:18 AM Aug 28th
 
ksclacktc
Bill, Great idea! I think if you further sub-divide by handedness and primary position, you will create interesting sub groups of similar players.
9:06 AM Aug 28th
 
Steven Goldleaf
Not to over-complicate an already overly complex system of classification, but I'd think that handedness would be a basic requirement of any "family" grouping. To take Bill's example, if you're trying to extrapolate players who hit into DPs (without DP data) one of the first things I'd want to know is "Did he hit right-handed?" It's tough to group righties with lefties to make general statements.
7:45 AM Aug 28th
 
elricsi
I love breaking the extra base hits down like that. I'm going to start using those codes, and I think that part will catch on. I am a bit stunned you used OPS and not OPS+.
12:29 AM Aug 28th
 
THBR
Damn, ANOTHER fascinating Bill James article that gives me a lot to think about. First comment without deep reflection: take your time, Bill, because this looks like it could be significant. I EAGERLY await the "next cut" article.
10:17 PM Aug 27th
 
Trailbzr
Since the numeric part already measures power, would making the letter OBP help?
7:55 PM Aug 27th
 
 
©2021 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy