Remember me

A reliable batting average

August 22, 2022
                    Reliable and Unreliable Batting Averages

 

We have this problem, in my Ballpark League, of pitchers being coded with batting averages which are wildly inaccurate representations of their true batting ability.  To take a case in point, some of you who are old farts like me will remember that in the early 1960s Hank Aguirre was the most notorious bad-hitting pitcher in baseball.  He was supplanted later in the decade by Sandy Koufax and then by Dean Chance, but when I first started reading baseball stuff, jokes and stories about what a terrible hitter Hank Aguirre was were common.  Aguirre had 388 at bats in his career and collected 33 hits for a career average of .085.  He never hit a home run, and he struck out in more than 60% of his career at bats, 236 out of 388. 

In 1967, however, the 36-year old reliever had two at bats, and hit a triple.  It was the only triple of his career.  He hit .500 for the season—1 for 2—so Ballpark, showing a rather bizarre lack of insight into the situation, makes him a .500 hitter for the season.  Since his only hit was a triple, almost every at bat on his card is a triple.   This distorts his role on the team immensely, as Ballpark managers are so tempted to sneak him into the game before the pitcher’s spot is due up.  It’s a BIG deal, actually, because if you’re down one run in the top of the ninth inning and you can put him in the game, then you’ve got a .500 hitter with a slugging average higher than Barry Bonds at his best leading off the bottom of the ninth inning.   It makes a difference. 

I use this extreme example to illustrate the concept of a "true" batting ability.  Obviously Hank Aguirre’s true batting average is not .500.  That’s an unreliable number.  The question I am getting to is, how many at bats does it take before a player’s batting average becomes a reliable statement of his underlying ability?   How reliable is a player’s batting average (as an indicator of his true ability) after 100 at bats, or 200 at bats, or 500 at bats?  

I should stress here that I am only talking about the true ability to hit for average.  Obviously there are some .270 hitters who are tremendously productive, like Mike Schmidt, and there are some .270 hitters who are not at all productive, like Manny Trillo, who was actually a .263 hitter but you get the point.  I am not dealing with that issue at all here; I am simply talking about the reliability of the batting average.    Not talking about park effects either.

            I don’t know why it took me until I was 72 years old to study this, but this is a pretty direct, simple study that addresses that question to my satisfaction.  What I did was:           

1)     Generated a "known" batting average for each of 1,000 simulated players.  The known batting average could in theory be anywhere between .170 and .370, but tended to cluster around .270. 

2) Created a randomized process in which a player would exactly match that batting average in an infinite number of at bats, but the sequence of hits and outs was random, and

3)  Compared the "generated" or "output" batting average with the        "known" or "input" batting average. 

            In other words, how long does it take for a .270 hitter to hit .270, and a .318 hitter to hit .318?  How many at bats does that take? 

            There are two concepts here that you could confuse if I don’t do a good job of distinguishing them.  Those two concepts are accuracy and reliability.  A player’s batting average may be an accurate representation of his true batting skill in a very small number of at bats.  A player may have a true batting average of .250, and in four at bats he may hit .250.  His batting average is a perfectly accurate representation of his skill level. 

            It MIGHT be, but we have no way of knowing.  It’s not reliable information.  Accuracy concerns the batting average of a single player.  Reliability is a characteristic of a group of batting averages, which changes with the number of at bats.

            Circling back to this:

2)     I created a randomized process in which a player would exactly match that batting average in an infinite number of at bats, but the sequence of hits and outs was random.  

There will be readers who will say to themselves "But hitting is not random.  It depends on the performance of the pitcher, the performance of the hitter.  It’s a skill; not a random occurrence. 

            Well, yes.  A series of at bats by a hitter is not a random sequence.   At bats by a hitter are not random, but they have almost all of the identifiable characteristics of randomness.   They’re not random, but they have the characteristics of randomness.   It’s like your dog.  Somewhere else in the world there is another dog who has the same number of legs and eyes and ears as your dog, a dog who is the same height and weight as your dog, the same color and color patterns as your dog; in short, another dog which is virtually identical to your dog.  He has (nearly) all the characteristics of your dog, but he is not your dog.  A series of at bats by a player is not random, but, because there are a long, long series of randomizing factors that separate each at bat from the next at bat,  the pattern has nearly all the identifiable characteristics of randomness.  

            So anyway, how long does it take before a .270 hitter can reliably be expected to hit .270? 

            Well, what do we mean by reliable, within this study?

            "Reliability" means the frequency of accuracy.   Let us say that after X number of at bats, every player’s batting average was the same as his true ability.  If that were true, then we could say that the player’s known batting average is a 100% reliable indicator of his true hitting ability after X at bats.

            But that means that we have to define "accuracy".  If a player has a true batting level of .280 but hits .282, you wouldn’t say that was "inaccurate", would you?   That’s pretty accurate.  

            But if a player has a true batting level of .280 and hits .325, you wouldn’t say that his batting average was an accurate measure of his batting skill, would you?  I am sure you wouldn’t.  He’s not really a .325 hitter.

            I decided that, for purposes of the study, 30 points was the outside limit of what might be called "accurate."   If a player who is a true .260 hitter hits .289, that’s not very accurate, but it is more accurate than if he hits .320 or .195.   It’s not wildly inaccurate. 

            This is the scale I used.  If a player’s "trial" batting average is the same as his "true" batting average, then the batting average is 100% accurate. 

            If a player’s trial batting average is 1 point off from his true batting average, I would score that as 97% accurate—29 out of 30.

            If a player’s trial batting average is 2 points off from his true batting average, I would score that as 93% accurate—28 out of 30. 

            If a player’s batting average in the trial is 25 points off from his true, long-term batting average, I would score that as only 17% accurate—5 out of 30. 

            And if a player’s batting average was 30 points off from his true batting average, that would not be considered accurate.  29 points off, 3% accurate, but 30 points off, 0% accurate. 

            The reliability is the average level of accuracy, among many players, for a certain number of at bats.  At 50 at bats, the average hitter prototype within the study had a batting average which was 17% accurate as a representation of his true underlying batting average.  On average, if a player has 50 at bats, his batting average is going to be 25 points off from what it should be. 

            This is a chart of the reliability of batting averages up to 100 at bats:

10 At Bats     8%      Average error more than 28 points batting average.

20 At Bats     11%    Average error more than 27 points batting average.

30 At Bats     14%    Average error more than 26 points batting average.  Read "more than" into the following sentences; I’ll skip typing it but you’ll know it is there. I’ll explain later.

                        40 At Bats     15%    Average error 25-26 points batting average.

                        50 At Bats     17%    Average error 25 points batting average.

                        60 At Bats     19%    Average error 24 points batting average.

                       ​; 70 At Bats     21%    Average error 24 points batting average.

                        80 At Bats     23%    Average error 23 points batting average.

90 At Bats     25%    Average error 22-23 points batting average.

                   &​nbsp;    100 At Bats   26%    Average error 22 points batting average.

            So after 100 at bats, a player’s batting average is about 26% reliable as an indication of his true ability to hit for average, and the average hitter is more than 22 points off from his true, long-term batting average.  There is another useful question we can ask here, which is "At X number of at bats, how many players will be within 30 points of their true batting average?" 

            At 20 at bats, 21% of players will have a batting average within 30 points of their true batting average.  In 30 at bats, that’s 28%.   Then, increasing it by 10 bats, the numbers are 31%, 35%, 38%, 42%, 46%, 48% and 51%.  After 100 at bats, essentially one-half of players will have a batting average which is within 30 points of what they should be hitting, and about one-half will not. 

            The "more than" thing. . . .let’s say that at a certain level of at bats, a batting average is 10% reliable.  That would mean that it is 90% inaccurate.  Assuming that 30 points is zero percent reliable, that would suggest that 10% reliable means 27 points average error.

            Yes, but not quite, because errors larger than 30 points are being effectively counted as 30 points. If a player’s batting average is 30 points away from what it should be, that is counted as zero, but if it is 40, 50 or 60 points away from what it should be, that also is counted as zero.  So an average error of 27 points actually means more than 27 points.  This effect lasts until 1500 to 2000 at bats.  After 1500 to 2000 at bats there are no players left who have discrepancies greater than 30 points, so after 2000 at bats it is no longer "greater than 8 points"; it is just 8 points.  (If I was doing the study over I would have steered around that problem, but you know. . . .at some point you have to move on to the next study.)

Taking it up from there:

 

At Bats

125

150

175

200

225

250

275

300

Reliability of Batting Average

28%

31%

34%

35%

37%

39%

40%

41%

Average Error of Batting Average

.022

.021

.020

.020

.019

.018

.018

.018

% of Players within 30 points of True Average

53%

60%

63%

64%

67%

69%

72%

74%

 

            After 300 at bats, a player’s batting average is 41% reliable as an indication of what he should hit.  74% of players are within 30 points of their true ability at that point, but 30 points isn’t much of a claim; that’s a 60-point window out of a range that runs generally from .200 to .340, or 140 points. 

            What do you think of as a full season’s worth of at bats?  How about 550?  550 at bats is about what regular gets in a season, isn’t it?  How reliable is the batting average at that point?

 

At Bats

350

400

450

500

550

600

650

700

Reliability of Batting Average

44%

46%

49%

51%

53%

55%

57%

58%

Average Error of Batting Average

.017

.016

.015

.015

.014

.013

.013

.013

% of Players within 30 points of True Average

77%

81%

85%

89%

90%

91%

93%

95%

 

            At 550 at bats the batting average is 51% reliable as an indicator of what the player should hit, but the average hitter is still 14 points away from where he should be—meaning, of course, that he is exactly on target sometimes and 30 points off sometimes.  If a player hits .330 one year and .285 the next the media will give you explanations for it, but it doesn’t really mean anything; it’s just random fluctuation.  It’s just something that happens.  The chip shots and line drives caught by the shortstop don’t even out in 550 at bats.  It takes more than a season for that to happen. 

            Let’s go now to the multi-season numbers:

 

At Bats

1000

1500

2000

2500

3000

4000

5000

6000

Reliability of Batting Average

64%

70%

73%

76%

77%

81%

83%

84%

Average Error of Batting Average

.011

.009

.008

.007

.007

.006

.005

.005

% of Players within 30 points of True Average

98%

99%

100%

100%

100%

100%

100%

100%

 

            Of note there:  after 1500 at bats the average player’s batting average is within 10 points of what he should hit.  It takes about three years as a regular for the AVERAGE gap between a player’s batting average and his true skill level to shrink to less than 10 points.   By 2000 at bats, 100% of players are hitting within 30 points of what they should hit—not a true 100%, but 100% rounded off.  There was one "player" in the study who was still 31 points off target after 4,146 at bats. 

            6,000 at bats is a pretty long career, ten years as a regular or a little more.  After 6,000 at bats you can be really, really certain that a player’s batting average is within 30 points of his true skill level, and you can be 90% sure than it is within 10 points.   And finally, this chart:

At Bats

7000

8000

9000

10000

11000

12000

13000

Reliability of Batting Average

86%

87%

87%

88%

89%

89%

90%

Average Error of Batting Average

.004

.004

.004

.040

.003

.003

.003

% of Players within 30 points of True Average

100%

100%

100%

100%

100%

100%

100%

 

            I stopped at 13,000 because that’s about as long as a career can go, but also because the data was stressing out my poor little computer.   1,000 players and 13,000 events for each player, and it takes 9 cells to generate and record each event, figure each batting average and compare it to the "true batting average"; with some other counting stuff the spreadsheet is over 250 million cells, which a small personal computer has been known to complain about.  I don’t actually have to keep all 250 million cells "live" at the same time.  I don’t think my computer would do that. 

            One more little point. . . .a low batting average achieves a higher level of reliability than a high batting average, in the same number of at bats.  In 200 at bats, a .300 hitter is more likely to hit .350 than a .200 hitter is to hit .250.  Fairly obvious reasons for that, so I’ll skip the explanation. 

            Anyway, the conclusion is that for the all of the bloops and bleeders to reliably even out so that the batting average represents skill and conditions, but is no longer meaningfully effected by luck. . .that takes much, much longer than any player’s career.   In 13,000 at bats, a number that almost nobody reaches, a batting average is 90% reliable, but the average player is still 3 points over what he deserved or 3 points under it.  As to how many at bats it would take before the batting average of a player is 100% accurate as to his skill level, I really don’t know, but it is obvious from the data that it would be more than 50,000 at bats. 

 

            One other note of explanation. . .. .I am not able to comment on your responses to this or any other article.  It’s a software issue.  You may have noticed that our software is not always what we would want it to be.   The software architecture was created 15 years ago by a company that we have not worked with for quite a few years now, and it’s a struggle for us to keep it working.

            I had this problem with the software, which was that I had two log-in IDs with the same name but different passwords, one of which took me to the reader experience and one of which took me to the Admin site.   When the Admin software forced me to update my password, which it does pretty often, something weird would happen that we don’t understand, and I would have to re-create my entry path into the reader portion.   That was annoying, so I asked the computer guys if they could fix it so that I had only one log-in, and they did, but after they fixed it I was never able to get on to the site as a user at all.  I can get on to READ it, as anyone can; I can read your posts, but I can’t respond to them, haven’t been able to for months.   So. . .it’s awkward.  You probably know that our software is awkward, but you have no idea how awkward it really is.   I offer you my apologies, and my regrets for my inability to respond to your comments.  There is probably some way to do it, and there is probably some way to train a hummingbird to light on your finger, but I have no idea what either one of those is.   Thanks for reading. 

 

 
 

COMMENTS (22 Comments, most recent shown first)

DefenseHawk
Fireball Wenz,

I agree with you. It would be more realistic for 1980 Doug Rader in a simulation to have his stats based on, say, equal weight given to 1978/1979 and 1980. With a batting average somewhere in the .260s, he could still have a chance to hit .320, as well as .200.

In otherwords, smooth out the outlier stats or seasons. The trouble is it's a lot of work, but less so with computer sims like DMB.

One could go about and re-work Mickey Mantle's stats to avoid outier seasons and stats so that his 1956 and 1957 seasons are based on a batting average of say .330. He'd still have a decent change of hitting .356 and .365, but also .306 and .304 (his actual 1955 and 1958 B.A.). But there are some replayers who will always be disappointed if Mantle doesn't hit [i]exactly[i] 56 homeruns in 1961; anything else to them is [i]unrealistic[i].

That disconnect between what some replayers will accept and what other replayers [i]expect[/i] is what DrArbiter was talking about when he wrote:

(a) have a program that simulates baseball, or
(b) have a program that outputs season statistics similar to real life.


He gave another example:

[i]Suppose we built an entire run of season disks, say from 1920 to today, based on the sort of principles we've been talking about, where we try our best to rate players based on their established talent rather than year-by-year performance, and we ran a "replay of baseball history." What would the time evolution of the HR record look like? Of course, it would be different in every run of the replay, but we can be fairly sure of the overall pattern. Sometime in the 1920s, someone, probably someone named George, would hit about 60 HRs at some point. Then, there would be a series of challengers every couple years, and those challengers would almost certainly have the "right" names -- Wilson, Foxx, Greenberg, Kiner, Mays, Mantle, perhaps Maris. Which one broke the record would probably be different -- or maybe they'd all still fall short. Maybe Reggie Jackson would be the one; maybe Albert Belle would do it. But Brady Anderson would almost certainly not even come within 10, and you wouldn't have to worry about Duane Kuiper showing up on the list. But, most likely, the record stays at near 60 up until the late 1990s.[/i]

[i]On the other hand, if you run the same exercise using actual season disks, I bet it's even odds the record is 70 by the time 1962 rolls around.[/i]

[i]In my exercise, all the players still have the right "shape" -- they all have the roughly the same strengths and weaknesses they do in real life -- but their specific good years and bad years will change, and how high their peaks and low their valleys are changes too -- but within the same range.[/i]

As for as the Brett/Rader scenario, in DMB player tendencies can be set so that NO ONE ever pinch hits for George Brett (especially if you're using 1980, when he hit .390) if you are using the computer manager option.
1:20 AM Aug 27th
 
shthar
Dave Rader hit .291 two years in a row, playing in Candlestick. @700 PAs.

That might actually be more impressive than hitting .320 playing for Boston.


1:04 AM Aug 27th
 
Fireball Wenz
Even if you limit a player's plate appearances to the number he had in the season, it allows a table-game manager to save him for crucial situations - bases loaded in the ninth, now batting for George Brett, 1980 Dave Rader!

The question comes: do you you want the simulation to focus on reflecting the TEAM's outcome (Rader hit what he did for the Red Sox in 1980), or the INDIVIDUAL's actual ability level (Rader never appeared in a MLB game after 1980, because everyone knew no matter his stats, he was no .320 hitter.

I always focused on trying to estimate the individual's ACTUAL ability levels, not his one-season stats.
10:36 AM Aug 26th
 
abiggoof
You mean Roger LaFrancois isn’t the greatest hitting catcher ever? Lifetime .400!

My brother and I came up with various rules over the years for use of guys like LaFrancois, Henry Mercedes (an .800 season) and Gus Polidor (1.000 season).
10:39 AM Aug 25th
 
phorton01
Note to Bill - I guess this would be very inconvenient for you, but couldn't you just create a second account for yourself where you log in just like one of us schlubs? I guess it would be a pain to have a second tab open in your browser all the time to comment, but at least you could use it to tell me how stupid I am.
8:56 AM Aug 25th
 
Rox26bez
In regards to the Hank Aguire situation... I was in a tournament once and in every game my opponent brought George Puccinelli in to pinch hit (16 AB, .563, 3 HR's in 1930). Ugh!
1:29 PM Aug 24th
 
3for3
In my old stratomatic league, you could only bat the number of times you did during the season, unless you qualified for the batting title. Also, Strat didn’t make cards for low AB totals. Pitchers were given a rating, iirc, not a card based on their actual stats
6:02 AM Aug 24th
 
3for3
In my old stratomatic league, you could only bat the number of times you did during the season, unless you qualified for the batting title. Also, Strat didn’t make cards for low AB totals. Pitchers were given a rating, iirc, not a card based on their actual stats
6:02 AM Aug 24th
 
Manushfan
Mr Manush hit .330 lifetime despite committing the unpardonable sin of not being Paul Waner. It's a good time.
4:59 AM Aug 24th
 
jgf704
I agree with willibphx that this analysis would be a universal property of binomial distributions.

willibphx wrote: "The confidence in the reliability looks very high compared to a binomial calculation in the very low AB scenarios. "

I would agree. For example, the article quotes an average error for 10 AB of "more than 28 points". I think the *actual* average error is closer to 70 points. However, the average error quoted by Bill is low because of a quirk in how he defined the error: "errors larger than 30 points are being effectively counted as 30 points".
6:27 PM Aug 23rd
 
DefenseHawk
Bill sure isn't kidding about the issues with the software. Let me try this again with only ONE link to the referred DrArbiter post. (his main quote below is found on the 3rd page of the linked post)

First, I'd would like to address the question as to what to do about Hank Aguirre in a 1967 replay so that his hitting in a simulation is more akin to his real-life ability.

As any aficionado of baseball simulations knows, you are not the only one who has come across a "Hank Aguirre" problem. Or it might be a 1974 "Rick Auerbach" problem, where an otherwise .220 career hitter batted .342 in 73 at bats for the Dodgers.

There's only one way to "fix" this in order to have a player perform more like his real-life ability: adjust his "card" (or in this case, the inputted stat line) the simulation uses to randomly create the result on any given plate appearance. Perhaps use Aguirre's 1966 stats in your sim instead.

My prefered sim is Diamond Mind Baseball (DMB). I've played APBA for Windows and of course Strat in the pre-computer days. Extra Innings as well. With DMB is is fairly easy to "re-card" a player. Many of them might be late season call-ups (where I believe MLEs should be used rather than wierd .100 or .500 batting averages or 11.00 ERAs.) Naturally, in a league with other players there needs to be some pre-season agreement as to what to do with outier seasons for some players.

Of course, some replayers will scream, "But that .342 batting average was Auerbach's real-life ability - in 1974!"

But the trouble with NOT adjusting the player's "card" is that the simulation will compound that outlier Auerbach in 1974 with a variance that might allow Auerbach (even if you create a "rule" to use him in no more than 73 at bats) to sometimes hit .500 in your replay.

Take Roger Maris, for example. He never hit more than 39 homeruns in a season - except in 1961 when he smacked 61. One way to ask the question is, if you replayed "real-life" how many times would Maris hit 61 homeruns in 1961? The answer? Every time. Whether you re-ran "real-life" once, twice or 1,000 times.

But a baseball simulation isn't recreating "real-life." It is recreating an alternate reality where probabilities for each plate appearance are based on real-life results over the course of the entire season, not on any given plate appearance.

In otherwords, there is no guarantee in a simulation that Maris will go 0 for 3 with one strikeout on an opening day 6-0 loss against the Twins at Yankee Stadium, whether you use the exact lineups in your replay or not. So, therefore, there is no guarantee that he will hit exactly 61 homeruns by the end of your simulation, either.

Instead, the simulation is going to use his 1961 batting line and create a "card" that over thousands and thousands of replays will result in Maris "averaging" 61 homeruns in replay simulations of the 1961 season. Sometimes he will hit 61, sometimes 45, sometimes 83. The larger the total replays you use, the closer you will get to that 61 "average."

A fellow by the user name of "DrArbiter" wrote about what he called as "double-counting the variance" in a post on a Diamond Mind Baseball forum years ago.

In otherwords, using a "card" for a player who had an outlier season (in otherwise, just like the one Bill described for Hank Aguirre in 1967).

https://www.tapatalk.com/groups/dmbforum/what-are-these-distortions-season-disks-cause-t7889.html

DrArbiter said:

The fault lies not in DMB, but in the season disk concept. Season disks have too much variance in the underlying talents of players, compared to the variance in their talents in real life. And this isn't a problem just with DMB's season disks -- it's a problem with every replay-oriented game ever made.

To put it simply, there's a dichotomy. You can either

(a) have a program that simulates baseball, or
(b) have a program that outputs season statistics similar to real life.

You can't have both -- and notice that the word 'baseball doesn't appear in my option (b). What you're left with if you employ the ex-post-facto methods for "fudging" (which is a polite word for the game "cheating") is something that ain't baseball.

The statistically sound way to build disks based on real-life players is one where, as Eric wrote, we need to, to use the term, "fudge" the inputs. The inputs ought to be our best estimates of what the player's current talents are, based upon his observed performance. This is what the projection disk essentially does, so it's not as if it's an unheard-of concept.

As for Roger Maris and 1961, I did a test to see how he (and others) would do if Maris' DMB "card" for 1961 would project him to hit 48 homeruns.

After 30 simulations, Maris led the league in HRs only three times (57, 54 and tying Mickey Mantle with 51). He averaged 46 homeruns, ranging from 34 to 59 (Mantle hit 62 during the sim when Maris hit 59). Babe Ruth's record was broken 13 times: 9 times by Mantle and twice each by Rocky Colavito and Harmon Killebrew.

Mantle, who in real-life hit 54 homeruns in 1961, averaged 56 over the 30 simulations. Killebrew averaged 46 (he actually hit 46 that year). Mantle ranged from 41 to 69; Killebrew ranged from 32 to 68.

As I understand it - and I'm sure the advanced mathematicians on BJO would explain it better than me - by using a Binomial Distribution Table, I found that a ballplayer who hit 50 homeruns in real-life should be expected to hit exactly 50 homeruns in a simulation on average 5.9 times over the course of 100 simulations.

But he'd also be expected to hit 63 homeruns 1% of the time and 70 homeruns 1/10% of the time (or once in 1000 simulations).
8:04 AM Aug 23rd
 
DefenseHawk
First, I'd would like to address the question as to what to do about Hank Aguirre in a 1967 replay so that his hitting in a simulation is more akin to his real-life ability.

As any aficionado of baseball simulations knows, you are not the only one who has come across a "Hank Aguirre" problem. Or it might be a 1974 "Rick Auerbach" problem, where an otherwise .220 career hitter batted .342 in 73 at bats for the Dodgers.

There's only one way to "fix" this in order to have a player perform more like his real-life ability: adjust his "card" (or in this case, the inputted stat line) the simulation uses to randomly create the result on any given plate appearance. Perhaps use Aguirre's 1966 stats in your sim instead.

My prefered sim is Diamond Mind Baseball (DMB). I've played APBA for Windows and of course Strat in the pre-computer days. Extra Innings as well. With DMB is is fairly easy to "re-card" a player. Many of them might be late season call-ups (where I believe MLEs should be used rather than wierd .100 or .500 batting averages or 11.00 ERAs.) Naturally, in a league with other players there needs to be some pre-season agreement as to what to do with outier seasons for some players.

Of course, some replayers will scream, "But that .342 batting average was Auerbach's real-life ability - in 1974!"

But the trouble with NOT adjusting the player's "card" is that the simulation will compound that outlier Auerbach in 1974 with a variance that might allow Auerbach (even if you create a "rule" to use him in no more than 73 at bats) to sometimes hit .500 in your replay.

Take Roger Maris, for example. He never hit more than 39 homeruns in a season - except in 1961 when he smacked 61. One way to ask the question is, if you replayed "real-life" how many times would Maris hit 61 homeruns in 1961? The answer? Every time. Whether you re-ran "real-life" once, twice or 1,000 times.

But a baseball simulation isn't recreating "real-life." It is recreating an alternate reality where probabilities for each plate appearance are based on real-life results over the course of the entire season, not on any given plate appearance.

In otherwords, there is no guarantee in a simulation that Maris will go 0 for 3 with one strikeout on an opening day 6-0 loss against the Twins at Yankee Stadium, whether you use the exact lineups in your replay or not. So, therefore, there is no guarantee that he will hit exactly 61 homeruns by the end of your simulation, either.

Instead, the simulation is going to use his 1961 batting line and create a "card" that over thousands and thousands of replays will result in Maris "averaging" 61 homeruns in replay simulations of the 1961 season. Sometimes he will hit 61, sometimes 45, sometimes 83. The larger the total replays you use, the closer you will get to that 61 "average."

A fellow by the user name of "DrArbiter" wrote about what he called as "double-counting the variance" in a post on a Diamond Mind Baseball forum years ago.

In otherwords, using a "card" for a player who had an outlier season (in otherwise, just like the one Bill described for Hank Aguirre in 1967).

https://www.tapatalk.com/groups/dmbforum/what-are-these-distortions-season-disks-cause-​t7889.html

DrArbiter said:

The fault lies not in DMB, but in the season disk concept. Season disks have too much variance in the underlying talents of players, compared to the variance in their talents in real life. And this isn't a problem just with DMB's season disks -- it's a problem with every replay-oriented game ever made.

To put it simply, there's a dichotomy. You can either

(a) have a program that simulates baseball, or
(b) have a program that outputs season statistics similar to real life.

You can't have both -- and notice that the word 'baseball doesn't appear in my option (b). What you're left with if you employ the ex-post-facto methods for "fudging" (which is a polite word for the game "cheating") is something that ain't baseball.

The statistically sound way to build disks based on real-life players is one where, as Eric wrote, we need to, to use the term, "fudge" the inputs. The inputs ought to be our best estimates of what the player's current talents are, based upon his observed performance. This is what the projection disk essentially does, so it's not as if it's an unheard-of concept.

https://www.tapatalk.com/groups/dmbforum/what-are-these-distortions-season-disks-cause-t7889-s10.html<​/a>

As for Roger Maris and 1961, I did a test to see how he (and others) would do if Maris' DMB "card" for 1961 would project him to hit 48 homeruns.
After 30 simulations, Maris led the league in HRs only three times (57, 54 and tying Mickey Mantle with 51). He averaged 46 homeruns, ranging from 34 to 59 (Mantle hit 62 during the sim when Maris hit 59). Babe Ruth's record was broken 13 times: 9 times by Mantle and twice each by Rocky Colavito and Harmon Killebrew.

Mantle, who in real-life hit 54 homeruns in 1961, averaged 56 over the 30 simulations. Killebrew averaged 46 (he actually hit 46 that year). Mantle ranged from 41 to 69; Killebrew ranged from 32 to 68.

As I understand it - and I'm sure the advanced mathematicians on BJO would explain it better than me - by using a Binomial Distribution Table, I found that a ballplayer who hit 50 homeruns in real-life should be expected to hit exactly 50 homeruns in a simulation on average 5.9 times over the course of 100 simulations.

But he'd also be expected to hit 63 homeruns 1% of the time and 70 homeruns 1/10% of the time (or once in 1000 simulations).
7:59 AM Aug 23rd
 
willibphx
Is this not a binomial distribution question? The confidence in the reliability looks very high compared to a binomial calculation in the very low AB scenarios. This same logic can be applied to HRs, Triples, walks, etc..

On the board game topic, one of the adjustments I have seen is to add essentially replacement level stats for the player to get them up to a minimum number of PAs like 20. I would argue it should be higher.
6:39 AM Aug 23rd
 
TonyClifton
Frank took my comment, by the time you get to at bats in the thousands, doesn't aging start its decline effect?

Also, I wonder how this study would do with WAR.
1:34 AM Aug 23rd
 
Fireball Wenz
The Hank Aguirre problem is why I never used any "one-season" simulation. Dave Rader shouldn't be better than Carlton Fisk. Even at age 11, I was using three seasons worth of data in my homemade board games.
1:25 AM Aug 23rd
 
Fireball Wenz
The Hank Aguirre problem is why I never used any "one-season" simulation. Dave Rader shouldn't be better than Carlton Fisk. Even at age 11, I was using three seasons worth of data in my homemade board games.
1:25 AM Aug 23rd
 
Fireball Wenz
The Hank Aguirre problem is why I never used any "one-season" simulation. Dave Rader shouldn't be better than Carlton Fisk. Even at age 11, I was using three seasons worth of data in my homemade board games.
1:25 AM Aug 23rd
 
FrankD
Very interesting article. Regression to the mean can take a long time. I wonder if adding an age correction would improve this in that in a long career there is usually a physical drop off - the path to a career .270 over a long time may be something like up to .300 and then down to .250. Or maybe that is making this simple model too complex.
6:32 PM Aug 22nd
 
FrankD
Very interesting article. Regression to the mean can take a long time. I wonder if adding an age correction would improve this in that in a long career there is usually a physical drop off - the path to a career .270 over a long time may be something like up to .300 and then down to .250. Or maybe that is making this simple model too complex.
6:32 PM Aug 22nd
 
FrankD
Very interesting article. Regression to the mean can take a long time. I wonder if adding an age correction would improve this in that in a long career there is usually a physical drop off - the path to a career .270 over a long time may be something like up to .300 and then down to .250. Or maybe that is making this simple model too complex.
6:32 PM Aug 22nd
 
Jaytaft
Hank Aguirre, the .500 hitter, terrorizing your Ballpark League... there's a good movie about it, "Aguirre, The Wrath Of God." Did it make your list of favorite movies? I remember like 2 years ago you mentioned having a list - has it been or will it be published? I'd love to see it.

Aguirre struck out Ted Williams the first time he faced him. After the game, he asked Williams to autograph the ball. Williams signed it but wasn't happy about it. Aguirre faced Williams again a couple weeks later and, this time, Williams crushed the first pitch for a homer. While rounding the bases, Williams yelled to him, "Get that ball, and I'll sign it, too."


6:00 PM Aug 22nd
 
evanecurb
Now I understand how Norm Cash hit .361 in 1961 and Glenn Beckert hit .348 in 1971. It happens.
5:05 PM Aug 22nd