BILL JAMES ONLINE

The "Matthew Effect": The Impact of Birth Date on Baseball Success

November 4, 2009

 

 

Why July Children Don’t Make the Major Leagues

 

In a chapter of the 2008 non-fiction book Outliers, author Malcolm Gladwell studies the birth dates of successful hockey players, discovering that a disproportionate number were born towards the beginning of the calendar year.  Gladwell’s explanation for this phenomenon (termed the “Matthew Effect”) relates to the age cutoff date for youth hockey leagues.  The Canadian author points out that the nearly universal cutoff date for hockey leagues is December 31.  Players born shortly after the cutoff date are a few months older and more developed than their average competition and would therefore have a slight advantage. 

 

However, when the slightly older, bigger, and stronger athletes are selected for All-Star teams and travel squads, they spend more time honing their skills and benefit from superior coaching and the added attention.  Within a few years, the slight age advantage has turned into a significant ability gap- not because the players born in January or February were naturally better athletes than the December birthdays but because they had greater opportunities to develop.

 

While Gladwell’s presentation is littered with selective samples and is lacking in thorough statistical evidence, the point is not lost.  The reader can easily follow the logic and understand the book’s broader theme that much of what Gladwell terms “success” is largely a product of circumstances beyond the individual’s control. 

 

Gladwell notes that we see evidence of the Matthew Effect elsewhere.  The American education system is largely based on arbitrary cutoff dates; in Alabama, for example, the cutoff date for most schools is September 1.  A child born on August 31, 2000 would start kindergarten in August 2005, but a child born one day later would wait an additional year to begin his/her education.  Cutoff dates vary from school to school and state to state, most falling between August 1 and December 31.  Regardless of when the cutoff date falls, Gladwell’s theory holds water- the oldest children in each grade hold an inherent advantage that does not simply disappear with time.  In fact, Gladwell says that children on the younger end of their grade are underrepresented in colleges and universities by over 10%. 

 

But of course, we’ll leave it to the politicians to reform the American education system.  Let’s look at a much more enjoyable American institution- baseball. 

 

Applying Gladwell’s Theme to Baseball

 

I will examine the best of the best baseball players- those who reached the Major Leagues.  Before we get into the data, however, what do we expect to find? 

 

For 55 years, the official Little League Baseball age cutoff was July 31.  (In 2006, the age cutoff was moved to April 30, but no little leaguer from 2006 has reached professional baseball yet so our study will not be affected.)  Other organizations often abide by the same age cutoffs (Dixie Youth also moved their cutoff date from July 31 to April 30 for the 2006 season), but some don’t.  Most school-affiliated teams rely on grade level, which as mentioned earlier varies from state to state and school to school.  Overall, I’m guessing we’ll see more Major League Baseball players born shortly after the 55-year Little League cutoff date, July 31.

 

The Study and Results

 

I will use the publicly-available Lahman database (which includes birth dates of most of the 17264 players included).  First, we’ll look at all players in the Lahman database, dating all the way back to Alexander Cartwright (born April 17, 1820). 

 

There were 16656 players with birthdays listed in the Lahman database.  Spread them out over 365.25 days, and there are an average of 45.6 players born per calendar day.  Exactly 70 major leaguers were born on November 18, the highest one-day total.  Twelve players were born on Leap Day, February 29.

 

Different months have different numbers of days (duh), so we’ll look at the day-by-day trend first.  Leap Day comes every fourth year, so we’ll make an adjustment and look at the number of MLB players born on each day above/below the average of 45.6 (or 11.4 for Leap Day). 

 

Here’s the plot of MLB players born as a percentage above average by day:

Wow, that’s a lot of noise.  Still, if you look closely, it seems like there’s a jump right around August 1st.  Not a bad start. 

Let’s smooth it out a little bit with a five-day moving average:

Look at that jump in August!  The data jumps from around -10% in July to +10% or +15% in the eighth month.  Further, it seems like the August jump holds on, gradually declining into December and January and even into the next summer.  Let’s do one final graph, using 31-day moving averages and shifting our graph to start at August 1:

 

 

The trend looks pretty clear to me.  We see an obvious dip around September 16 that fully recovers by October 1, and after bottoming out in early June we see a small spike from June to July.  But for the most part, we’re witnessing a negatively-sloped linear trend beginning on August 1.

 

Here are the results in chart form:

 

MLB Players By Birth Month (Lahman Database)

Month

Days

Players

Expected

Above Exp

Pct Above Exp

January

31

1438

1414

24

1.7%

February

28.25

1269

1288

-19

-1.5%

March

31

1369

1414

-45

-3.2%

April

30

1286

1368

-82

-6.0%

May

31

1273

1414

-141

-9.9%

June

30

1213

1368

-155

-11.3%

July

31

1342

1414

-72

-5.1%

August

31

1622

1414

208

14.7%

September

30

1447

1368

79

5.8%

October

31

1579

1414

165

11.7%

November

30

1444

1368

76

5.6%

December

31

1374

1414

-40

-2.8%

 

 

Recent Trends

 

Let’s see if anything has changed recently.  Organized Little League baseball got its start in the 1950s and the Amateur Draft began a couple decades later, so recent players will be more affected by the age cutoff dates (in theory).  Here’s every player in the Lahman Database born in 1950 or later:

 

MLB Players By Birth Month (Lahman Database since 1950)

BirthMonth

Days

Players

Expected

Above Exp

Pct Above Exp

January

31

525

518

7

1.4%

February

28.25

457

472

-15

-3.1%

March

31

495

518

-23

-4.4%

April

30

494

501

-7

-1.4%

May

31

472

518

-46

-8.8%

June

30

432

501

-69

-13.7%

July

31

411

518

-107

-20.6%

August

31

653

518

135

26.2%

September

30

545

501

44

8.8%

October

31

588

518

70

13.6%

November

30

539

501

38

7.6%

December

31

487

518

-31

-5.9%

 

 

If anything, it appears that the trend has intensified in recent years.  August has over a 25% spike, while July comes in with the lowest rate at 20% below average.  There is still some month-to-month noise, but the trend is unmistakable.  August-September-October average out to +16% per month, while the three preceding months average to -11%. 

 

The Baseball Info Solutions (BIS) database contains all major and upper minor league players active since 2002, a total of 5363 players with valid birthdays (there are several without a birthday listed).  Here is a similar chart from the BIS data:

 

MLB Players By Birth Month (BIS Database)

BirthMonth

Players/Day

Pct Above Expected

January

14.0

-4.9%

February

13.1

-10.8%

March

13.7

-7.1%

April

14.7

-0.2%

May

14.0

-4.7%

June

13.7

-6.7%

July

12.5

-15.4%

August

18.6

26.6%

September

17.8

21.1%

October

15.9

8.2%

November

14.7

0.1%

December

13.7

-6.9%

 

This chart tells the same story.  August and September are over 20% spikes, while July is a full 15% below average.  Across the full year, the trend is less consistent, but we might expect that from a smaller sample of data. 

 

It’s pretty clear that August and September birthdays have an edge at reaching the Major Leagues.  Once they arrive, does the advantage still hold? 

 

MLB Hitting Stats By Birth Month (Players born 1950-present)

Month

AVG

OBP

SLG

January

0.261

0.321

0.377

February

0.263

0.324

0.382

March

0.264

0.325

0.386

April

0.265

0.325

0.385

May

0.265

0.327

0.389

June

0.263

0.324

0.384

July

0.261

0.321

0.378

August

0.263

0.325

0.384

September

0.261

0.322

0.381

October

0.262

0.323

0.378

November

0.261

0.322

0.385

December

0.262

0.320

0.380

Total

0.262

0.323

0.382

 

Well, yes- August-borns do hit slightly better than those born in July in terms of batting average, on-base percentage, and slugging average; however, MLB players born in the spring months March through May have slightly higher numbers no matter which measure you use. 

 

Interestingly, even though there are significantly more August birthdays than July birthdays, the average August-born player is just as good or better than the average July-born hitter.  Thinking about it differently, if all of the extra August-borns were marginal or below-average major leaguers, we would see the hitting stats by birth month drop during August.  Since we don’t see this drop, we know that August is not discriminatory- it’s producing more above and below-average hitters.

 

Seasonal Birth Rates

One thing Gladwell didn’t address is the distribution of birthdays across the entire population.  As it turns out, the summer months July through September see around 4-5% more births per day than average, while the winter months November through March have the fewest births per day:

 

U.S. Births by Month, 1995-2002

 

Month

Total Births

Births per Day

Above/Below Average (%)

January

2,582,009

10,411

-4.0%

February

2,409,565

10,662

-1.7%

March

2,645,413

10,667

-1.6%

April

2,537,816

10,574

-2.5%

May

2,673,858

10,782

-0.6%

June

2,629,368

10,956

1.1%

July

2,788,695

11,245

3.7%

August

2,813,582

11,345

4.6%

September

2,740,831

11,420

5.3%

October

2,694,594

10,865

0.2%

November

2,532,156

10,551

-2.7%

December

2,631,533

10,611

-2.1%

1995-2002 Total

31,679,420

10,842

0.0%

 

Assuming this trend holds up for earlier years, this information is enough to explain some of the spike from June to July, for example, but hardly justifies the large jump from July to August.  July actually sees nearly as many births as August. 

 

 

Conclusions

 

What did we learn from all of this?  Even after considering month-to-month birth rate variation, it is clear that the Matthew Effect holds true in baseball, particularly since the standardization of youth baseball in the 1950s.  Very clearly, the group of potential baseball players born shortly after the July 30 Little League cutoff date holds a sizeable advantage over the younger competition.  This disparity does not wear off quickly, resulting in around 20% more MLB players born in August and around 20% less in July. 

 

All current major leaguers probably weren’t born with “major league talent” but stood out because they were slightly older and more developed.  As a result, they received more opportunities to further develop their abilities and parlayed their advantage into a significant ability gap and a Major League career.  On the flip side, some Little Leaguers with plenty of raw talent get edged out by their slightly older opponents somewhere along the line.

 

Bill James’s article “The Player Passages Model” on Bill James Online has an excellent illustration of how talent gets filtered up to the Major League level.  Bill’s model explains this process far better than I can with words alone. 

 

What does the Matthew Effect mean for baseball?  As Bill astutely points out, there’s probably more “major league talent” than is currently being developed.  Some naturally talented baseball players run into a wall early in their playing days because they don’t stand out among their slightly older teammates and opponents.  This player might not receive the same added attention and additional development, or they might quit baseball altogether.  They might even divert to another sport where their birthday might give them a slight advantage instead.

 

The Matthew Effect doesn’t necessarily imply that teams should prefer players born in August over players born in July.  We saw that hitters born in any particular month don’t appear to hold any advantage over hitters born in other months once they reach the majors.  By the time players reach a draft-able age, they have presumably developed so far past their innate talent level that their birth date no longer lends an advantage or disadvantage.

 

However, this does illustrate the flaws with the way we (as a society) use age in years as an evaluative measure.  The Matthew Effect is perhaps more devastating in non-athletic settings.  As mentioned previously, Gladwell notes its impact on education.  While only a handful of exceptional baseball players reach the major leagues, every American child is impacted by the country’s education system.  What if younger students scored 100 points worse on the SAT than their older classmates?  Clearly the education system would be failing to make the most out of the next generation of citizens. 

 

Of course, fixing this problem isn’t simple.  Without working out all of the details, some sort of revolving age cutoff might help.  If fall baseball leagues used a different cutoff date than spring leagues, the disadvantages would theoretically be lessened.  Similarly, schools could promote students between grades more than once per year based on birthdates.  Or we could simply be more conscious of an individual’s age relative to their competition when evaluating their abilities.

 

However, if organized baseball could manage to develop these overlooked ballplayers as well as it develops August-born talent, there would be about 25% more MLB-caliber players.  The league could expand to 7 or 8 additional markets without a drop in quality, leading to more revenues for the league, owners, players, and individual cities.

 

On one final note, these results make Little League Baseball look a bit foolish.  According to their website, Little League Baseball moved the cutoff date from July 31 to April 30 “so that most players on a team will spend the majority of the regular season at the same chronological age as their league age”.  What good does that do?!?  By moving the cutoff date three months, Little League Baseball didn’t actually fix anything.  They essentially transferred the advantage from August birthdays to May.  And on a baseball diamond somewhere in America, a child with an April birthday just watched his Major League aspirations collapse right in front of his eyes.

 

 

Resources

Malcolm Gladwell’s book was the inspiration for much of this article.  Here’s the Amazon.com link: 

http://www.amazon.com/Outliers-Story-Success-Malcolm-Gladwell/dp/0316017922

 

Thanks to Bill James for his fascinating article (entitled “The Player Passages Model”) on Bill James Online that relates (in-part) to Gladwell’s thesis.  Thanks also to Bill James Online subscriber Mark Rice for his question on this topic posted in the “Hey Bill” section of the website.  After reading Outliers, I did the study on my own but didn’t think anyone else would be interested.  After reading Bill’s article and Mark’s question, I knew there were at least two other people who have connected Gladwell’s discussion to baseball. 

 

Birth rates by month taken from one-year sample of CDC data:

http://www.cdc.gov/nchs/data/nvsr/nvsr52/nvsr52_10.pdf

 

Little League age cutoff information is cited from:

http://www.littleleague.org/media/newsarchive/04_2005/05littleleague_011105.htm

and

http://www.eteamz.com/llbeurope/news/index.cfm?id=2877847&cat=0&SitePage=

 

School cutoff dates by state were referenced from: 

http://mb2.ecs.org/reports/Report.aspx?id=32

 

MLB International Signing Date information is cited here:

http://sportsillustrated.cnn.com/2009/writers/melissa_segura/07/02/internatio​nal.signing/index.html

 

The Lahman Database is an excellent baseball statistics resource, available at baseball1.com. 

 

Seasonal birth rate information from:  http://abcnews.go.com/Health/Science/story?id=990641

 

The Baseball Info Solutions (BIS) players database information was collected on August 14, 2009. 

 

The Matthew Effect seemingly holds in Chess too:  http://main.uschess.org/content/view/8975/343/

 

After writing this article, I found the following blog entry which found the same July-August jump in baseball.  http://sportsologist.com/birth-month-affect-on-baseball-players-part-i/

 

 
 

COMMENTS (10 Comments, most recent shown first)

THBR
Excellent point, ventboys!
11:57 AM Nov 9th
 
jrickert
__Me: It does appear that it's the weaker players driving the nonuniformity in birth month.
_Ben: If that is the case, then how do players born in August have an OPS .010 higher than July-borns?

1. I could be wrong
2. There could be some random variation swamping other effects
3. Perhaps the longer term players effects swamp the minor players effects
4. I could have made a programming error

To take another look, I ran the numbers again, this time computing the batting totals as well.
I totaled, PA,AB,H,OB,TB ad used those to compute BA/OBA/SLG and came up with slightly different numbers. We might want to compare them through e-mail (there's a link at the bottom of the article that I linked to earlier)

My hit stats calculation for U.S. hitters born starting 1950 was
Mon BA OBA SLG num low/high PAlow/PAhi
Jan .264 .337 .353 190 158/32 119247/169975
Feb .265 .334 .353 163 149/24 118977/141843
Mar .266 .332 .359 197 163/34 117289/209662
Apr .267 .336 .357 204 160/44 078534/235817
May .270 .342 .361 180 147/33 095530/190808
Jun .270 .338 .359 174 135/39 094652/231683
Jul .270 .341 .363 156 114/42 068054/242749
Aug .265 .336 .356 253 209/44 151654/251509
Sep .266 .333 .354 221 181/40 116340/236080
Oct .262 .331 .349 207 171/36 127058/184293
Nov .265 .336 .357 197 149/48 113281/273647
Dec .265 .335 .355 182 147/35 107730/204667
I'm not sure where our numbers went apart.
The BA/OBA/SLG monthly values range for low PA players was
.250-.255/.314-.321/.328-.339
for high PA player
.270-.279/.341-.353/.362-.375

As others have said, nice article!

4:18 PM Nov 6th
 
jrickert
__Me: It does appear that it's the weaker players driving the nonuniformity in birth month.
_Ben: If that is the case, then how do players born in August have an OPS .010 higher than July-borns?

1. I could be wrong
2. There could be some random variation swamping other effects
3. Perhaps the longer term players effects swamp the minor players effects
4. I could have made a programming error

To take another look, I ran the numbers again, this time computing the batting totals as well.
I totaled, PA,AB,H,OB,TB ad used those to compute BA/OBA/SLG and came up with slightly different numbers. We might want to compare them through e-mail (there's a link at the bottom of the article that I linked to earlier)

My hit stats calculation for U.S. hitters born starting 1950 was
Mon BA OBA SLG num low/high PAlow/PAhi
Jan .264 .337 .353 190 158/32 119247/169975
Feb .265 .334 .353 163 149/24 118977/141843
Mar .266 .332 .359 197 163/34 117289/209662
Apr .267 .336 .357 204 160/44 078534/235817
May .270 .342 .361 180 147/33 095530/190808
Jun .270 .338 .359 174 135/39 094652/231683
Jul .270 .341 .363 156 114/42 068054/242749
Aug .265 .336 .356 253 209/44 151654/251509
Sep .266 .333 .354 221 181/40 116340/236080
Oct .262 .331 .349 207 171/36 127058/184293
Nov .265 .336 .357 197 149/48 113281/273647
Dec .265 .335 .355 182 147/35 107730/204667
I'm not sure where our numbers went apart.
The BA/OBA/SLG monthly values range for low PA players was
.250-.255/.314-.321/.328-.339
for high PA player
.270-.279/.341-.353/.362-.375

As others have said, nice article!

1:40 PM Nov 6th
 
DaveFleming
Welcome aboard, Ben. I was starting to feel a little lonely.

Also: I touched on Gladwell's book in an old article, "The Outlier Year" which is less graphtastic than your work (it was, however, written before the H1N1 pandemic, and it talks a lot about influenza).
12:42 PM Nov 6th
 
evanecurb
Good study, and interesting. Would removing foreign born players from the data have an impact one way or the other? (I am guessing the trend isn't as strong for players outside of the US if the cutoff dates are different).

When I was playing American Legion ball (approx. 100 years ago), it was noticeable how many 18 year old players (the cutoff) every year were college freshmen. All of them had 19th birthdays between Aug. 1 and Sept. 30, so you would have figured one starter per team on average. I think in reality it was more like 3 starting players per team.
9:31 AM Nov 6th
 
ventboys
Now wait a minute. Baseball players are born to baseball FANS, aren't they? Maybe it's not about the player. Maybe it's about the player's parents and their reproducing habits.

July babies are conceived in October, August babies in November. Baseball ends late in October. Conceptions (Concepcions?) dramatically increase in November, and they taper off in March when spring training starts. They settle in for a couple of months, and then they taper off as the season heats up, reaching their nadur in early September. The rate goes up a bit as September goes on (and some fans' teams fall out of the race), then it begins to rise a bit more in October, when the weather gets colder and most teams are eliminated. Come November the baseball fan has no excuses, so the conception rate spikes. The leaves are raked, lets make ballplayers.

I'm kidding, of course (I think). Good stuff, glad to have you here to educate me.
1:15 AM Nov 6th
 
dburba
As someone born in April, I'm glad I can point to something other than an obvious lack of talent for why I'm not a major leaguer.
5:15 PM Nov 5th
 
baseballben
Thanks for the link to your article and the Slate article, jrickert.

I disagree with this statement, though:
"It does appear that it's the weaker players driving the nonuniformity in birth month."

If that is the case, then how do players born in August have an OPS .010 higher than July-borns? Granted, I didn't attach a statistical significance to it, but if weaker players are driving the spike, then how do the averages come out slightly better?

I suspect that while an elite level talent will make the majors regardless of what month he was born in, he'll still see an added edge.
4:21 PM Nov 5th
 
jrickert
Regarding the non-U.S. players I took a similar (but slightly different) look at it a few years ago and posted it at http://homepage.mac.com/johnrickert/baseballbirthmo.html
The US players effect appeared after 1950 and did not show up in the foreign-born players.
The effect was strongest in pitcher with fewer than 100G, then in batters with fewer than 3163 PA.

10:49 PM Nov 4th
 
Trailbzr
Is this restricted to US-born players? If not, that would make a worthwhile comparison.
5:56 PM Nov 4th
 
 
© 2011 Be Jolly, Inc. All Rights Reserved.