Remember me

Faux Edgar

June 19, 2007
All of baseball history is much the same to me. Whether something happened in 2006 or 1941 or 1974 or 1912 means very little to me; I am as likely to be aware of it and as likely to take an interest in it in one era as I am in another. I understand that this is an unusual perspective on the sport. Most normal people, whose lives are not utterly consumed by baseball history, are more likely to be interested in players that they actually saw than in players who were dead before their grandfather was born. I’m always trying to remind myself to talk about the players that other people can relate to.

So I was writing about Zeke Bonura, and I got to thinking about Edgar Martinez. Edgar, like Zeke:
    a) was a tremendous hitter,
    b) was a lousy fielder, and
    c) had some great seasons, but not enough of them to make a Hall of Fame career.
It should have been a Hall of Fame career. It wasn’t a Hall of Fame career basically because the Mariners in the late 1980s had their head up their ass, pardon my french. Edgar reached Triple-A in 1985, hitting .353 at Calgary that year (20 games), then hitting .329 there in ’87, .363 in ’88. In his first major league trial he hit .372 in 13 games in ’87, then .281 in 14 games in ’88. . .altogether he was 25 for 75, .333. He was a brutal third baseman, however, and, when he didn’t hit much early in ’89, he found himself back at Calgary, where he hit .345 in 1989. As the Mariners saw it, he was a bad third baseman and a minor league hitter. Umm. . .you know you have a designated hitter rule, don’t you?

So anyway, Edgar struggled to play third base until he was past 30, at which time he became a DH—the best DH in the game, as a matter of fact. In 1995 he hit .356 with 52 doubles, 29 homers, 113 RBI. Zeke Bonura should dream of such numbers. That was his best season, but not my much; he hit over .300 and drove in over 100 runs six times. His lowest on-base percentage in any of those seasons was .423.

So let’s take Edgar’s six hundred-RBI seasons as his defining years, and see who else was wearing an Edgar suit. These are the stats for Edgar’s six defining seasons:

YEAR G AB R H 2B 3B HR RBI BB SO SB CS Avg OBA SPct OPS
1995 145 511 121 182 52 0 29 113 116 87 4 3 .356 .479 .628 1.107
1996 139 499 121 163 52 2 26 103 123 84 3 3 .327 .464 .595 1.059
1997 155 542 104 179 35 1 28 108 119 86 2 4 .330 .456 .554 1.009
1998 154 556 86 179 47 1 29 102 106 96 1 1 .322 .429 .567 .995
2000 153 556 100 180 31 0 37 145 96 95 3 0 .324 .425 .579 1.004
2001 132 470 80 144 40 1 23 116 93 90 4 1 .306 .423 .543 .966


Games: 132 to 155
At Bats: 470 to 556
Runs: 80 to 121
Hits: 144 to 182
Doubles: 31 to 52
Triples: 0 to 2
Home Runs: 23 to 37
RBI: 102 to 145
Walks: 93 to 116
Strikeouts: 84 to 96
Stolen Bases: 1 to 4
Batting Average: .306 to .356
On Base Percentage: .423 to .479
Slugging Percentage: .543 to .628

The man could really hit.
Edgar, it turns out, is entirely unique. . ..and we don’t even have to go to the end of the process to figure this out. Once we have done the elimination for strikeouts, the only players left who fall within the parameters outlined above are:
Edgar Martinez, 1995
Edgar Martinez, 1996
Edgar Martinez, 1997
Edgar Martinez, 1998
Edgar Martinez, 2000
Edgar Martinez, 2001

And if we didn’t do any elimination for strikeouts, Edgar would be entirely unique anyway; all of the players who are eliminated for strikeouts would also have been eliminated for batting average. Or for On-Base percentage.

On the other hand, at this point we haven’t really proven that these players have unique skill sets. It could be, alternatively, that this process we are using simply makes them appear to be unique. It could be that if you take six regular players at random and define the group by the highest and lowest totals of those six, the mathematics of the process are such that everybody who isn’t used to define the group will tend to be eliminated from the group. In other words, if you start out with 72,905 players and eliminate 30% of them here, 40% there, 70% there and repeat these eliminations 14 times, you’re going to wind up with nobody.

Let’s look at that, starting with the math. There are 72,905 batter/seasons in major league history, through 2006. If you start with that number of hitters and eliminate 10% of them in each cycle and you have 14 elimination cycles, you’ll wind up with 20,591 players. If you eliminate 20% of them in each cycle, you’ll wind up with 5,010 players. If you eliminate 30% in each cycle, you’ll wind up with 1,010 players. If you eliminate 40% in each cycle, you’ll wind up with 159 players. If you eliminate 50% in each cycle, you’ll wind up with 18 players.

If you eliminate 60% in each cycle, you’ll probably wind up with one player left. But if you eliminate 70% in each cycle, 96% of the time you’ll wind up with nobody.

In other words, if you eliminate 60% to 70% of the players in each step of this process, you’ll wind up with nobody except those who define the group—whether or not the group is actually unique. So we could be finding players as unique here not because they actually are unique, but simply because we’re eliminating 60-70% of the players in each step, or some equivalent combination—30% one time, 85% the next.

Let’s try a random experiment. Let’s form a group not by one player, but by six players. . .let’s say, six players chosen at random who all played between 140 and 162 games.

OK, the six players chosen at random are:
    Elbie Fletcher, 1937
    Amos Otis, 1971
    Davey Lopes, 1975
    Andre Dawson, 1980
    Darnell Coles, 1986
    Jose Vidro, 2002
Actually it’s not entirely at random; I disqualified the early players for whom we don’t have strikeout data. Anyway, the parameters of this group are:
    Games: 142 to 155
    At Bats: 521 to 618
    Runs: 56 to 108
    Hits: 133 to 190
    Doubles: 22 to 43
    Triples: 2 to 7
    Home Runs: 1 to 20
    RBI: 38 to 96
    Walks: 40 to 91
    Strikeouts: 64 to 93
    Stolen Bases: 2 to 77
    Batting Average: .247 to .315
    On Base Percentage: .321 to .378
    Slugging Percentage: .308 to .492
I’m assuming that the purpose of this experiment is obvious, but maybe I should explain. If six players chosen at random form a group that excludes everybody except those six players, then the process is eliminating everybody not originally included not because those players are unique, but simply because this process tends to eliminate everybody not originally included in the group. If, on the other hand, this group of six randomly selected players picks up 25, 50 or 200 other players within its borders, then this suggests that the Bonura and Edgar Martinez groups found only Bonura and Edgar Martinez because Bonura and Edgar Martinez are truly unique, at least to some extent.

Because the percentages are critical to understanding this process, I also tracked what percentage of players were eliminated by each set of parameters. The “games” definition—142 to 155 games played—eliminated 92.7% of the original group of 72,905 seasons. Also, we happened to have here a fairly tight group of strikeouts, 64 to 93 strikeouts. That eliminated 65% of the players who were included up to that point. But many of the other parameters eliminated only 10 to 20% of the previously included players, thus ultimately failing to drive the number of included players near zero. Runs Scored: 56 to 108. That eliminated only 17% of the players who were in the group after games and at bats. Hits: 133 to 190. That eliminated only 10%. Stolen Bases: 2 to 77. That, surprisingly, eliminated 11% of the players in the group, all of those being players who had 0 or 1 stolen base. . .everybody who had more than 77 had already been eliminated.

I’m starting to gain an intuitive understanding of the math here. In order to eliminate almost everybody, we have to eliminate 60-70% at each step, although 90% in one step and 30% in the next is not at all the same as 60% in each step. But six players, chosen at random, will tend to cover. . .what percentage of the spectrum of players? It would be the same here as it would be for the IQs of a group of random accountants, for example, or the body fat percentages of a group of randomly chosen septuagenarians. I’m sure there is a mathmatician/statistician out there who can tell me exactly how to calculate this, but intuitively, six players chosen at random are probably going to cover 70, 80% of the spectrum top to bottom most of the time, thus will tend to eliminate only 20% to 30%. If you pick six septuagenarians off the sidewalk, you’ll probably get one really fat one and one really thin one. Thus, we very easily could do this again and wind up with 1,000 or even 5,000 players left in the group. On the other hand, if you repeat that process fourteen times, at least once in the fourteen you’ll probably pick six individuals who only cover a small part of the spectrum.

Let’s try it again. . ..this time I chose at random six players who had 350 to 650 at bats, who happened to be:
    Ned Hanlon, 1891
    Tommy Henrich, 1942
    Luis Aparicio, 1973
    Mike Lum, 1973
    Kevin McReynolds, 1987
    Pedro Guerrero, 1990
The parameters formed by these six players are:
    119 to 151 games
    455 to 590 at bats
    42 to 87 runs scored
    121 to 163 hits
    12 to 32 doubles
    1 to 8 triples
    0 to 29 home runs
    49 to 95 RBI
    39 to 58 walks
    30 to 89 strikeouts
    1 to 54 stolen bases
    .266 to .294 batting average
    .318 to .352 on-base percentage
    .309 to .495 slugging percentage
This time I wind up with 175 players—suggesting again that a group of players chosen at random is unlikely to exclude everybody not used to define the group.

OK, I’m enjoying this, but. . .I’ve got work to do, and I’d better go do some of it. A few thoughts on this process in closing, and then I’ll come back to it later.

1) I am trying to take an accidental discovery here and convert it into an actual tool that we can use to analyze baseball.
2) This is what I have done for thirty years, taking random observations and trying to shape them into analytical tools, only, since I have this on-line now, I am now doing in public some of the work that before I might have done in private.
3) There’s a great deal to be done before this becomes a useful tool, and, frankly, it seems more likely headed for the “this is fun to fart around with” category than the “we can actually learn something by doing this” category. But then, the hitter projection system and the similarity scores system started out as just fun-to-fart-around-with stuff, and those have now been imitated by countless other analysts and converted into solid analytical tools, so who knows?
4) There are several accidents of this process which are in danger of becoming calcified in the system. I started with six seasons for Zeke Bonura, and now I’m using six seasons as the standard, which probably doesn’t make sense, since not everybody is going to have six defining seasons. I started out with 14 “elimination steps” and am now in danger of developing a system that only works with 14 steps, so I need to be careful about that, since in some cases some other number might be more appropriate.
5) Since the system is more likely to be useful if it actually finds qualifying seasons than if it excludes them all, it might be more useful if we “systematically stretch” the parameters than if we limit them. In other words, since Edgar Martinez had 84 to 96 strikeouts in all of his defining seasons and since this unrealistically and artificially limits the group, it might be more useful if we defined his group as 79 to 101 strikeouts or something. But I have no idea how to systematically stretch the parameters.

6) There is probably some process than can be developed in which the standard number of defining seasons is ten—thus stretching the parameters by including more seasons—and, for players who don’t have ten defining seasons, we stretch the parameters artificially based on how many defining seasons they do have. But again, I have no idea now what that process would be.

Bill James
Ft. Myers, Florida
 
 

COMMENTS

No comments have been posted.
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy