Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Sea Creatures and Land Animals

By Bill James

December 25, 2012

This is a continuation of a debate which began here on December 12, 2012, just after Wil Myers was traded to Tampa Bay, in the article "On the Differences between Pitching Prospects and Hitting Prospects". There is a rather long preamble here which re-states at length arguments that many of you probably read through in the original, but. …it’s an important issue, and if somebody is reading this article in ten or fifteen years I want him to have a fair chance to understand everybody’s positions. My local sports columnist had written the following:

In time, Wil Myers might develop into one of the top power hitters in the game. At 21, he hit .314 with 37 home runs and 109 RBIs in 134 games of a season split between Double A (35) and Triple A (99). His pitch recognition might develop to the point he can strike out at a less disturbing rate than 140 times in 522 at bats. He’s an excellent prospect, all right.

The word "might" and "prospect" need not enter discussions about James Shields, the main player acquired by the Royals in the deal with the Tampa Bay Rays.

To which I responded:

Oh, I can give you a long list of mights that enter into the James Shields evalaution, but let’s fast forward. In time, Wil Myers might be something; he isn’t anything yet, but later on, down the road, he might become something. Not trying to parody the sportswriters words or to state them unfairly; I think that’s an accurate summation of his point; Wil Myers isn’t anything yet, but later on he might be something.

Most of us guys, I suspect, see the situation a little differently: that Wil Myers is a very good baseball player, right now. He was a very good baseball player in 2012; there is every reason to believe that he will be the same player in 2013, although his statistics will not be the same because the players he will be playing against are better. Later on, he may develop to an even higher level, true, but he is the same thing now that he will be in a year, and therefore the distinction between "prospect" and "player" is, on some level, a silly distinction. It relies on doubt that exists only because of ignorance, and thus exists only for the ignorant.

We cannot make absolutely accurate projections as to what any player will hit next season, whether he is a rookie or whether he has been in the league for ten years. But we can project what Wil Myers will hit in 2012 as accurately as we could project the same if he had been in the league for ten years, and this is a fairly high level of accuracy. The sportswriter thinks of Wil Myers as he does because he fails to understand this. He believes that there is an element of doubt in the equation that is not really there, or does not need to be there. Thus, he is basing his analysis of the trade on a categorization of the players, and basing the categorization of the players on his own ignorance, his own lack of sophistication. It’s an analysis that is based, at the deepest level, on the ignorance of the writer.

The original article (On the Differences Between Pitching Prospects and Hitting Prospects) contains more comments in a similar vein, but I am just trying to reconstruct the essence of the debate, in order to make the points that I wanted to make next. Responding to this, a poster using the name "myachimantis" wrote the following:

I'd like to just mention that it may be easier to project major leaguers simply because there is more information. Perhaps teams collect batted ball data for their minor league systems, but it isn't publicly available. With batted ball data, one can take a look at a hitter and say whether they had a season that was largely the result of a BABIP that was too high given their batted ball profile or that their season was a true measure of their talent and their BABIP accurately reflected their batted ball data. Same goes for pitchers, especially when looking at HR/FB%. We have more information about major league players, making it easier to subtract out the luck from their performance, and make projections going forward.

To this I responded that "You can give many reasons why it SHOULD be easier to project major leaguers than minor leaguers—but the fact is that it isn’t. Minor league hitters can be projected as major league hitters as accurately as major league hitters can be projected as major league hitters. This is probably the most valuable insight of sabermetrics—and has yet to be fully digested or accepted by the sabermetric community, thus remains as an advantage to be exploited by major league teams who do understand this."

MWeddell posted in response to this:

Seems like we are talking past each other in this article's comments. No one is saying that one can't project future performance based on minor league batting statistics. The debate is whether, if everything else is the same, are past minor league batting statistics just as useful at projecting future performance as past MLB statistics are?

Since reading www.insidethebook.com/ee/index.php/site/article/minor_to_major_correlations/ a few years ago, I have considered that minor league batting statistics are less useful than MLB batting statistics are at predicting future performance. If there is evidence to the contrary, it'd be great to know.

At the time Mr. Weddell posted this I let it go without checking the article that he had referenced. But having now checked out that article, I am puzzled by what his point is. Mr. Weddell seems to believe that the article he references argues against the reliability of minor league stats as major league predictors—but in fact, as I would read the article, it actually argues IN FAVOR of the reliability of minor league batting statistics as much as it argues against them. The article he references is straightforward; it doesn’t really pitch an argument one way or the other, it merely presents facts. The data that it presents seem to me to be consistent with my point of view, but perhaps I’m just not reading the data right, I don’t know.

Anyway, someone named Bruce (or claiming to be named Bruce) next posted the following—this still on the day the original article was posted.

Why is it so "absurd" to distinguish between players who have proven the ability to be successful at the major league level and those who haven’t?

Here again, something I didn’t see at the time. . ..Bruce "quotes" the word "absurd", although actually that word had not been used in the discussion up to that point. I had said it was a silly distinction, not that it was absurd. Anyway, to this I responded "Proven, to who? Wil Myers has proven to my satisfaction that he has the ability to be successful at the major league level. I am absolutely, 100% satisfied that he does. When you say that he has not proven this ability, then, you are talking about a distinction that exists only in your mind. . .or only in my mind; that doesn’t matter, in whose mind the distinction rests. It is absurd to divide players based on a distinction that exists only in your mind."

This, by the way, is still what I think; certain things that I said before were poorly stated and/or were stated in such a way that I couldn’t defend them in a more thoughtful debate, but so far we haven’t hit those things. So far, what I was saying then is still what I believe today.

Someone using the name "jemanji" then posted

Am intrigued by the implication that Wil Myers is (roughly) as good a bet to be a difference-maker in the majors as are (say) James Shields or Alex Gordon or other players who ARE difference-makers in the majors.

In saying that these people ARE difference-makers in the majors, what he actually means is that they HAVE BEEN difference-makers in the past. He thus ASSUMES that the present is more closely connected to the past than to the future, which is not true, and which obviously constitutes a bias the debate we are having, but let’s move along. Jemanji continues:

I’m a big believer in MLE projections myself.

Small point, which will become important later on. MLEs are not projections. I may well have spoken of them as projections myself, in the past; let’s not get hung up on that. MLEs are like exchange rates, how many Yen equals a dollar. They’re not predictions.

I’m a big believer in MLE projections myself, and a bigger believer in 21-year-olds who rake at AAA. . . But the plot thickens with this second axiom: a few months ago you acknowledged that a prospect who had hit well in the majors for a short time had passed an important test. There are lots of AAA hotshots who turn out to have weaknesses that are exposed by ML precision (Jose Lopez was a cleanup hitter in AAA at age 20 who totaled about 6 WAR for his career). .... My question: what SPECIFICALLY about Myers are you looking at, as opposed to any other 21-year-old who had a 900 OPS+ in AAA? Or are these kids IN GENERAL as valuable as ML impact players?

There are three responses to this, which I gave in various postings over a period of days:

1) That would depend on the generalization. If by "these kids", you are generalizing 21-year-olds to include 22-year-olds, then it is less true. If by "these kids" you include defensive liabilities with young center fielders, then it is less true. If by "these kids" you're including players with an .880 OPS along with players with a .930 OPS, then it is less true. It depends on how you generalize.

2) The "test" that players have to pass is earning the confidence of the major league community. A young actor might well have the acting skills of Alec Guinness. If Hollywood doesn’t believe in him, he’s working local theater.

This "perception of skills" issue is not as large in baseball as it is in acting, or as it is in many "skill" professions. In acting, I would bet that 90% or 95% of those who have the ability to be stars never get the chance to shine. In baseball, most players who have ability will get a fair shot. But it’s not a zero issue in baseball, either.

3) The Jose Lopez example is actually extremely instructive about the profound misunderstandings in this area. If you look at Lopez, it could not possibly be any more apparent that the change in his level of hitting ability occurred at the MAJOR LEAGUE level, after he had been in the majors for five years and had just short of 3,000 plate appearances as a major league player. In 2008--four years after the minor league season that you reference--Lopez hit .297 for the Mariners, with 41 doubles and 17 homers giving him a.764 OPS. The season AFTER that he hit .272 with 42 doubles and 25 homers, giving him a.766 OPS. At the end of that season he had 2,977 major league plate appearances. THEN he stopped hitting. ....3,000 plate appearances into his major league career, he stopped hitting.

The uncertainty of projection in Lopez’ case occurred in the middle of his major league career--yet in discussing him, you attribute this. . .you miss-attribute this. . .to his minor league/major league transition, and thus miss-attribute the uncertainty to his minor league performance. By doing this, you both overstate the uncertainty of projection based on the minor league performance, and understate the uncertainty of projection based on the MAJOR LEAGUE performance. This sustains you in your mistaken belief that minor league hitting stats are not reliable indicators of performance.

Why do you do this? Are you a fool, or are you determined to deceive us?

Well, of course you are not; you are merely doing what all of us do all the time. You have an organized way of thinking about this problem, and so your mind re-arranges the facts to be consistent with that way of thinking about the problem--even though, in truth, those facts are not AT ALL consistent with that way of thinking about the problem. We all dislike re-thinking our assumptions. This self-deception protects your mind from having to re-think your assumptions.

At this point Tom Tango entered the debate. Tango’s first post related to this was:

Yu Darvish and Felix are the same age, but our uncertainty level was higher for Yu entering 2012. And it's still higher today. Wil Myers is around the same age as Starlin Castro, Mike Trout, and Brett Lawrie. In terms of our uncertainty level as to how these four players will play over the next say five years, we have more uncertainty with Myers, simply because he hasn't faced the quality of competition that the other three have. Whether that uncertainty level is "a lot" or "a little" more than for the other three, that's really the question. I think the "proven" discussion makes it seem like it's black/white, when really, we're talking about shades of uncertainty levels.

My first response to this post was, I think, OK. "I agree with you about Darvish vs. Felix," I said. "That was part of my thesis, that this distinction DOES exist with pitchers." I think that was the right thing to say, to resist the mixing up of pitchers with hitters, since the foundation rock of the previous debate was that young pitchers are NOT like young hitters. With the second half of my response to Tom, I think I went awry:

The only uncertainty for Myers is playing time. If he plays, there is no more uncertainty in projecting his performance than Trout's or Lawrie's or any other player's. Where, then, is the "uncertainty" located, if not in the eye of the beholder?

In this post—taking Tom’s interpretation of the issue as my own—I blundered past a critical distinction. Tom is certainly correct in saying that there is a "transitional uncertainty" when a player comes to the major leagues. There is a transitional uncertainty when a player comes from Japan to the United States; there is a transitional uncertainly when a player changes teams; there is a transitional uncertainty when a minor league player comes to the major leagues. It may well be that this transitional uncertainty is greater for the minors-to-majors transition than for these other transitions. I do not agree that Myers’ major league production is less certain or more speculative than that of Starling Castro or Brett Lawrie, but in stating my argument in such a way that I denied the impact of the transitional uncertainty, I was absolutely incorrect.

Also, at some point in this discussion, Izzy 2112 made the point that the track record for a minor league player is rarely as long as the track record for a major league player, which creates a measure of uncertainty for the minor league player—also a valid point. Wil Myers has played only 99 games of Triple-A baseball, and only 233 games above A ball. 99 games for a player at any level may be atypical of his true skill level. In 2011 Jhonny Peralta hit for a higher batting average than Derek Jeter, each with more than 500 plate appearances. 99 games is not enough to get a true read on a player’s skills.

But this is does not fundamentally undermine my point. We know what Wil Myers is capable of, as a major league player. We know this just as much, in the case of Wil Myers, as we would if Myers had played those 99 games in the majors, rather than in the minors.

It’s like this: Suppose that we think of minor league players as sea peoples, or even as sea creatures, and major league players as land animals. The sea creatures are trying to make land, and there is a transitional difficulty inherent in that. Some people drown when they are trying to reach the shore, due to the undertow.

The undertow, at the shore, results not from any force directly pushing the swimmer out to sea, but rather, from the force of the tide pushing toward the shore, but then bouncing off and heading back out to sea. The energy is pushing in, but sometimes the recoil drags people back out. The same thing with rookies. Their focus is on reaching the majors, but sometimes the energy, the intensity of that experience works against them, and pulls them back out to sea. In trying to make the sea-to-shore transition, the majors-to-minors transition, some players will drown. If I ignored this fact in my earlier comments, I was wrong to do that.

Even when they get to shore, the sea peoples still face a transitional difficulty. They have to learn to walk, rather than swim—or, more modestly, they have to learn to walk on land, rather than walking around on the boats. They have to learn to watch out for lions and hippopotamus, rather than sharks and rays. They have to learn to eat the apples, rather than the guppies and koi. Of course this transitional stage will present difficulties for them, and of course the player’s statistics will be different while he is in the transitional stage than when he is either in the sea or when he is established on land. It was stupid of me to trap myself into denying this.

But that doesn’t fundamentally change what I am trying to say. My local sports columnist speaks of major league players and "prospects" as if they were fundamentally different things, like hammerheads and Dobermans. This view of minor league players is common not only in sportswriters, but among baseball professionals—and it is totally baseless. They’re not fundamentally different things. They’re exactly the same thing.

Can you take a team of major league players and a team of minor league players, mix and match them, put some of them on one team and some on the other, and play a game? Of course you can—in fact, once you put the uniforms on them, you’d never know which was which.

Major league and minor league players do in fact play against one another all the time. They play against one another, mixed up into different teams, in spring training. They play against one another in winter ball. In regular season players goes back and forth from the majors to the minors constantly. There are 200 players every year who split their season between major league time and minor league time. They’re not fundamentally different things.

And they do not have fundamentally different statistical profiles. Minor league statistics, probably adjusted for context, provide every bit as accurate a package of information about a player’s skills as do major league statistics. We are not at the end of the debate, but that is still my belief.

Continuing now with the exchange. Tango:

"This is the most important thing you could learn from me if you would stop refusing to learn it. " I would like to see more evidence in that case. My position is simply that every difference in context adds a layer of uncertainty. If a hitter goes from Coors to Oakland to St.Louis (Holliday), or if he moves from Japan to Yankees, or if he moves from AA to MLB, all those changes in context are severe enough that it has to add a level of uncertainty. And the more severe the change in context, then the more uncertainty we have (all other things equal).

Me:

I would certainly agree that there are uncertainties associated with all transitions, at the major league level or majors to minors. A player who is in his first year on a new team--like Carl Crawford coming into Boston in 2011--is demonstrably more likely to have a catastrophic season than is a player who is playing in the same place he was playing the year before.

I do NOT agree that these uncertainties are larger in going majors to minors than in going from one major league setting to another, and I would ask to see the evidence that they are larger.

In my view, I have been providing evidence for my position constantly for 30 years, and the world and the sabermetric community have been explaining it away and refusing to learn for 30 years, because it requires that people re-think their established assumptions. When Juan Gonzalez came to the majors in 1991 or 1992, we published projections for him that proved to be absolutely accurate. When Jason Heyward came to the majors in 2010, we presented projections for him that proved to be extremely accurate. . . .Jason Heyard, and Reid Brignac, and Trevor Crowe, and Ian Desmond. We publish very accurate projections for a dozen or more rookies every year in the Handbook. Why is this not evidence that it is possible to do this?

Let me try this another way. .. .if you are asserting a general theory of statistical uncertainty based on transitions, I doubt that I would disagree with you, and I would tend to accept the theory while awaiting proof. If, on the other hand, you are asserting a specific theory of statistical uncertainty applying uniquely to minor league hitting statistics, then what I would say is that over a period of many years we have presented much more than sufficient evidence to show that these projections can be made accurately.

Tango replied:

I did a very long post in comparing forecasts (2007-2010) by several prominent forecasters (PECOTA, ZiPS, etc), and I had broken it down in several ways. One of the breakdowns was based on "past MLB experience". And the average error in the forecasts for veterans was lower than that of part-time players which was lower than "pure rookies" (no MLB experience). In fact, the amount of error for the pure-rookies forecast was HIGHER than simply giving every pure-rookie an identical league-average forecast. It's extremely long, but this is about as detailed a test of forecasting systems that I've ever done. http://www.insidethebook.com/ee/index.php/site/article/testing_the_2007_2010_forecasting_

This was posted on December 14, two days after the original article. I said "Thanks, I’ll go look at it," and this where the discussion has rested until now. I have, however, been unable to find the article in question, so I will have to make assumptions about it.

First of all, let me point out one issue. In the years 2000 to 2011 first-year players in the majors had an average of 104 plate appearances. Second-year players had an average of 200 plate appearances (200.2727, so you don’t think I just rounded that off.) Third-year players had an average of 260 plate appearances, fourth-year players an average of 311, fifth-year players an average of 352, sixth-year players an average of 370. The graph actually peaks in the tenth year; players in their tenth season in the major leagues had an average of 384 plate appearances.

Of course we cannot project batting averages, slugging percentages or on base percentages accurately for players who are getting 104 plate appearances apiece, and of course the projections for those players would be more accurate if you just used league norms, rather than individual records, for those players. That’s the James/Stein paradox, established by Charles Stein and some other William James—not me—in Operations Research in the 1970s. (James and Stein studied batting averages of players based on the first two months of the 1970 season, and found that their batting averages for the rest of the season were predicted more accurately by the group norm than by individual performance.) I assume that Tom did something to adjust for this difference in plate appearances between rookies and veterans, but I don’t know what was done, and I will point out that it would be extremely difficult, if not impossible, to adjust this difference entirely out of existence.

Setting that issue aside, of course we have a transitional issue when we focus only on the batting averages of rookies. But that does not mean that minor league batting statistics are less reliable indicators of batting ability than major league statistics. It merely means that there is a certain perturbation of the data that takes place during the transitional stage.

Let us take, for example, the case of Felix PA, formerly known as Felix Pie. These are Felix Pie’s major league stats for 2007:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2007	87	177	26	38	9	3	2	20	14	43	.215	.271	.333	.604

And this is the projection that we had for Pie in the 2008 Handbook, based obviously mostly on his minor league performance.

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2007	87	177	26	38	9	3	2	20	14	43	.215	.271	.333	.604
2008 P	147	533	82	151	30	7	16	62	40	111	.283	.333	.456	.789

And this chart adds in his actual 2008 major league performance:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2007	87	177	26	38	9	3	2	20	14	43	.215	.271	.333	.604
2008 P	147	533	82	151	30	7	16	62	40	111	.283	.333	.456	.789
2008 A	43	83	9	20	2	1	1	10	7	29	.241	.312	.325	.637

OK, we were entirely wrong; his transitional difficulties persisted into the 2008 season—he had only 250 major league at bats at the end of that season—and he continued to have difficulty eating the apples and avoiding the lions. He never actually did get by that stage; we’d have to say that the lions ate him, or the hippopotamus squashed him, or the snake bit him, or something; one of them land animals did him in. Based on the 2007 or 2008 seasons, it would certainly be true that one would get a better projection for him by just taking the league norms.

But was our projection for him really wrong? Let’s look at what Pie did in 2009 and 2010, which were the only major league seasons to date in which Pie got 200 at bats:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2008 P	147	533	82	151	30	7	16	62	40	111	.283	.333	.456	.789
2009	101	252	49	67	10	3	9	29	24	58	.266	.326	.437	.763
2010	82	288	39	79	15	5	5	31	13	52	.274	.305	.413	.718

Our assessment of Pie’s major league abilities not only was not inaccurate; it was, in fact, extremely accurate. Add together his 2009 and 2010 seasons, the only seasons in which he batted 200 times, and compare that to the numbers we had projected for him. It’s uncanny how accurate we were—once Pie got more-or-less past the transitional stage.

Pie’s minor league record fails to "predict" his rookie-season and second-season performance not because his minor league records are misleading, but because his rookie season and second-season records are misleading. The minor league records, the MLEs, are right. It’s his major league numbers that are screwy.

Here’s Erick Aybar’s 2007 projection, compared to what he actually would do in 2007:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2007 P	124	466	69	122	24	4	6	69	20	51	.262	.292	.369	.661
2007	79	194	18	46	5	1	1	19	10	32	.237	.279	.289	.568

Not a good projection of the 2007 season. But were we fundamentally right about Erick Aybar, or were we fundamentally wrong?

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2007 P	124	466	69	122	24	4	6	69	20	51	.262	.292	.369	.661
2006	34	40	5	10	1	1	0	2	0	8	.250	.250	.325	.575
2007	79	194	18	46	5	1	1	19	10	32	.237	.279	.289	.568
2008	98	346	53	96	18	5	3	39	14	45	.277	.314	.384	.699
2009	137	504	70	157	23	9	5	58	30	54	.312	.353	.423	.776
2010	138	534	69	135	18	4	5	29	35	81	.253	.306	.330	.636
2011	143	556	71	155	33	8	10	59	31	68	.279	.322	.421	.743
2012	141	517	67	150	31	5	8	45	22	61	.290	.324	.416	.740

Here’s our projection for Alcides Escobar, from the 2010 Handbook, compared to what Escobar really did in the 2010 season:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2010 P	141	504	74	145	22	4	5	48	28	77	.288	.326	.377	.703
2010	145	506	57	119	14	10	4	41	36	70	.235	.288	.326	.614

Not too good, huh? Even though our playing time projections for him happened to be right, we missed his batting average by a whopping 53 points. But which was the "accurate" statement of his abilities, and which was the "misleading" statement of his abilities? You decide:

	G	AB	R	H	2B	3B	HR	RBI	BB	SO	Avg	OBA	SPct	OPS
2010 P	141	504	74	145	22	4	5	48	28	77	.288	.326	.377	.703
2008	9	4	2	2	0	0	0	0	0	1	.500	.500	.500	1.000
2009	38	125	20	38	3	1	1	11	4	18	.304	.333	.368	.701
2010	145	506	57	119	14	10	4	41	36	70	.235	.288	.326	.614
2011	158	548	69	139	21	8	4	46	25	73	.254	.290	.343	.633
2012	155	605	68	177	30	7	5	52	27	100	.293	.331	.390	.721
5 years	505	1788	216	475	68	26	14	150	92	262	.266	.307	.356	.663

The fact that his minor league records do not match his rookie-season production is not because his minor league records are misleading; it is because his rookie-season records are misleading.

I remember when Javy Lopez came to the majors 20 years ago, our projections showed him as a .300 hitter. He hit .245 as a rookie—but then settled in and hit around .300 for most of the next ten years.

Look, I hope you don’t think I am trying to mislead you about the accuracy of our projections for rookies. Projections for rookies are problematic. To me, that’s not the real issue. When a young player comes up, what I believe that most people want to know is, what kind of a player is he? Is he a .260 hitter, or a .290 hitter? What does he do well, and how well does he do it? That’s the question that I want to try to answer, to the best of our ability.

There are two issues here: Projection (or prediction), and the clarity or accuracy with which we can perceive the present reality of a player’s abilities.

The importance of this issue, to a major league executive, is this: that if you believe that the skills of a minor league player cannot be accurately assessed based on his minor league hitting record, then you are the captive of the major league population. You cannot blend into your team players who have been stuck in the minor leagues—even very good players who have been stuck in the minor leagues—because well, you just never know what they will hit.

But if you realize that you can project what a player will hit in the majors—allowing for the markdown in the transitional stage—then you realize that you are surrounded by an ocean of talent. You don’t have to sign a mediocre 30-year-old player to an $8 million a year contract. You can get players just as good out of the minor league pool—the minor league ocean—for the major league minimum.

OK, I oversold my case there. The oceans are not teeming with major league talent. 80 or 85% of the players who have been trapped in AAA don’t break through in the majors because they can’t. The transitional difficulties are a sort of "tax" on the usage of minor league players as major leaguers. You can use these players in the majors, yes, but that 28-year-old second baseman you like will be 30 years old before he gets his feet on the ground in the major leagues, and he’s going to cost you for a year before he gets going, and unless he is Ben Zobrist he may drown in the process.

But that doesn’t justify talking about "prospects" as if they were an entirely different species than major league players. There is a continuum of talent that connects Ryan Braun to the JuCo prospect who won’t get drafted. Pretending that there is something "special" about "proven" major league talent merely weakens your position, as a major league executive, and leaves you less able to solve your problems. It is still my position that the distinction between a "player" and a "prospect" is a silly distinction, because it relies on the ignorance of the observer.

COMMENTS (43 Comments, most recent shown first)

tangotiger
"I'd bet dollars to donuts that Group B had better careers in the majors"

Right, that's what I'm saying.
2:06 PM Dec 29th

jemanji
* with respect to uncertainty factor Tango ... I agree your point is 100% right, without reservation, if by 'uncertainty factor' we are emphasizing possible downside scenarios ... these would be reflected in a lower total WAR for group A, the minor leaguers ...

Perhaps it was a bit of confusion here, talking past each other, when we spoke of 20 Wil Myerses who had UP, MID, and LO career scenarios and, when talking about spread, I (or we) forgot to specify "assuming the same average WAR return for the Myerses" ...

Because, intuitively, we think of 20 Myerses as having a couple of huge-upside Pujols guys in there, and quite a few disappointments in there, and when we say "the spread is wider" than for 20 Lawries, in that sense we're right, I believe....

That asterisk that says "assuming the 20 Myerses have the same future WAR as a group" is one whale of a big asterisk ...
.
1:13 PM Dec 29th

jemanji
Certainly agree with you that spread can be an important factor ... Dave's mutual funds example is a classic case ... with the Mariners, Zunino's spread is perceived to be minimal and that's going to factor in to how quickly he's given the job...

I think it was *Bill's* original point that was the platform for the discussion, wasn't it :- ) ...

He said that Myers shouldn't be traded for a discount, and if Myer's mean return is equal to (say) Lawrie's then Bill was right, end of story ...

........................

That said, if you measure twenty minor leaguers in Group A with .350 wOBA MLE's, and twenty major leaguers in Group B with .350 wOBA actual stats ... I'd bet dollars to donuts that Group B had better careers in the majors...
.
1:01 PM Dec 29th

tangotiger
And greater uncertainty level for younger players, players off injury, and forecasting more years into the future.
9:00 PM Dec 28th

tangotiger
I agree that you pay players on the mean, with barely any discount for high spread players. But, that's not the point I was making all this time.

I was simply making the point about the UNCERTAINTY LEVEL and only that. That you increase the uncertainty level based on smaller sample, less familiar surroundings, and different levels of competition. And I was using the lay expression "reliability" as a stand-in for that.
8:58 PM Dec 28th

jemanji
Not really, not in MLB, not to my knowledge. If a team pays more or less for two different 4.0 WAR players, any incorporation of the 'spread' concept would have been strictly intuitive. You ever see a formula capturing that?

Tom on his site does WAR/$ analysis all the time, judging the merit of this or that FA contract, and I've never seen an adjustment for spread or riskiness of performance, given a certain expectation of WAR.

At least 98% of the discussion, there and around the blog-o-sphere centers around mean expectation. So, I assume that his recommendations to ML organizations are based on mathematical formulas that do not capture volatility in the sense we're talking about.

Maybe Fangraphs should have a column next to WAR that specifies REL, as BaseballHQ does, but right now they do not. That tells us quite a bit about how the spread is weighted, relative to the mean.

...............

James stated that he didn't believe that Myers should have been traded less because of the fact that he was a minor leaguer. If the mean WAR expectation for Myers is the same as that for Lawrie, then 98% of our precedent analysis dictates that we should grant James' point.
.
8:24 PM Dec 28th

studes
Financial analysts sure spend a lot of time and effort worrying about the potential spread and riskiness of an investment. It's a real-world concern to them. I would also guess it's very much a part of player signings too, often playing out in terms of injury risk.
7:28 PM Dec 28th

jemanji
1. wOBA - noted.

2. Re - confidence in the spread as well as the mean - I understand you 100%.

Meaning it with utmost respect, I WONDER if pure math majors have a tendency to overemphasize the spread's importance in real-world applications. SOMETIMES it's more of a theoretical concern, a classroom concern, than it is a practical concern.

After all, if ten Wil Myerses out of the minor leagues come up and hit (as a group) 4,000 homers in their careers, and ten equivalent Brett Lawries also hit 4,000 homers in their careers, then Bill's point seems fundamentally sound -- Myers should be traded for the same amount of booty as Lawrie is.

How VOLATILE a player's ups and downs are is not normally a big part of the question. When you calculate a new FA signing's contract value at InsideTheBook, you just ask "how many WAR is he expected to deliver, on average?" You don't routinely ask, what are the upper and lower bounds of his performance. The mean, that's 90%, 95% of what we are interested in.

Or correct me if I'm wrong :- )
.
5:05 PM Dec 28th

tangotiger
And I use wOBA. OPS is inferior, and I don't want to deal with it. I used OBP in place of wOBA, but anytime I say OBP, I really want to say wOBA. But, I don't want to derail this thread, so ignore what I just said.
4:27 PM Dec 28th

tangotiger
jemanji: If your interpretation of Bill's point is what Bill was saying, then I agree with your interpretation, and I've never said otherwise.

If you forecast the rookie Strasburg with an ERA that is 75% of league average, it's implicit that you mean he'll have an ERA that is 50% to 100% of league average. If you forecast the veteran Verlander with an ERA that is 75% of league average, it's implicit that you mean he'll have an ERA that is 60% to 90% of league average. (Or something to that effect.)

So, when Bill talks about "reliability", I took it to mean that our confidence not only in the average, but in the spread. Both of those things.

You seem to be interpreting Bill's reliability comment to rely only on the mean.

Anyway, my point is that the reason we have a larger spread for what we observe from rookies and non-MLBers is that the quality of information isn't as good as the MLB information. It's not as ... well, reliable. It has more uncertainty in what it is really trying to tell us.

So, that's what I keep saying. Minors data has more uncertainty than majors data. You should still be able to hit the mean forecast, but, you'll have a wider range.

Same thing with guys with only 300 career MLB PA and guys with 3000 career MLB PA: more uncertainty with the guy with 300 career MLB PA (even if they are both the same age). We should still be able to hit the mean forecast (of the GROUPS anyway), but we'll see wider results. That is, more uncertainty.

Anyway, this is 33 posts, and if it's just a miscommunication, then that's what it was.

4:26 PM Dec 28th

jemanji
And, since you specified OBP ... let's remember that for Japanese players we were able to forecast their OBP's but not their SLG's. :- )

They could put the bat on the ball and work the count as expected, but they were not able to drive the ball as expected.

.
4:19 PM Dec 28th

jemanji
What's your method for 'forecasting'? Couldn't we remove this moving part in the f(x) machine by --- > simply taking Group A and Group B that had performed 'identically,' the minor leaguers using MLE's, and then observing both groups future results?

But, assuming that the forecast method (minors forecasts vs majors forecasts) loses no horsepower to the back wheels, and granting the methodology as it were, then such a study would suggest ...

... that the 'phonograph horn' of guesstimated future performance was a bigger area for the minor leaguers, more volatility, but that the group ROI was no less than that of the vet MLB'ers, and that Bill's point is essentially correct with an asterisk.

The asterisk/quibble being the implication that Wil Myers was no more or less of a gamble than a vet MLB'ers. He WOULD be more of a roll of the dice, but the average return on the dice would not be diminished.

Further, this dice roll might be a very important factor, because in some roster situations it could be that the negative impact of a "washout" return hurt more than the positive impact of a "surplus" return would help. Of course, the reverse could be true; I'm a Stars and Scrubs guy and I'll roll three dice to try to get a 6.0 WAR player, discarding the washouts as needed. I like volatility in 21-year-olds.

The major point that Bill was making would be established - that ten Wil Myerses, as a group, are as much 'money in the bank' as are ten Billy Butlers. (Unless Myers' 50% return is seen as higher than Butler's current value, of course.)

Seems clear. Is that what you were, um, testing me on? ;- )
.
.
4:14 PM Dec 28th

tangotiger
jemanji:

What would it mean if those 9 guys I listed had a mean forecast of .370 OBP (based on information known from birth to year T), but the average error of their OBP forecast and their observed OBP (in year T+1, T+2, T+3) is .040, but the young MLBers also had a mean forecast of .370 but the average error was .030, and the veteran at-peak MLBers forecasted for OBP of .370 and average error of .020? What would that tell you?

(And, for the sake of discussion, we're not talking about just 9 hitting prospects in one year, but say 40 hitting prospects in ten years, and we identify 40 young MLBers also over those same ten years.)
3:18 PM Dec 28th

jemanji
* Below, I meant to say

Group A = twenty minor leaguers of (say) age 22 with MLE's averaging (say) 800 OPS, normalized

Group B = twenty MAJOR leaguers of (say) age 22 with ML stats of (say) 800 OPS, normalized ... these ML stats occurring after a 400-AB transition period

I think if you run that, and the minor leaguers as a group had ML futures equal to those in the major leaguer group, then Bill's point should be conceded.
.
2:45 PM Dec 28th

jemanji
Right, I realize Tango that you were setting up a study method. I was referring back to the larger issue of whether a Wil Myers is more of a gamble than a James Shields (or Billy Butler, if you want to talk hitters vs hitters). Sorry for zigging against your zag :- )

The study you suggest sounds great ... one question. What if we simply used MLE's for twenty 21-year-old minor leaguers in group A, and major league records for twenty 21-year-old minor leaguers in group B? Then, if both groups had similar MLE's / ML stats, you could look back and see whether the minor leaguers' futures had equalled the major leaguers'.

Seems MLE's would be more to the point than subjective BBA rankings.

There is one complicating factor: the league transition that Bill talks about. If Group B is going through transition, then its age-21 stats will be artificially depressed. You could correct for this by taking ML players who had (say) 400 ML at-bats in the rear-view mirror, but this might require comparing 22-year-olds rather than 21-year-olds.
.
2:42 PM Dec 28th

tangotiger
Checking real quick, if someone wants to study it, Baseball America's top prospects, entering 2003 who never played MLB prior to that season, and born in 1980 or later: Teixeira, Baldelli, Reyes, Mauer, Miguel Cabrera, Kotchman, Morneau, Jason Stokes, Hanley Ramirez. The questions therefore are:
a. what were each of their forecasts for 2003-2005
b. how much over or under were they of their forecasts
c. how does this compare to the four MLB guys identified

Jason Stokes is an obvious huge miss (never even played MLB), and Kotchaman probably was very much under the forecast. Cabrera might have performed way over his forecast. So, this is the process to go through, to figure out how much over/under these highly-touted prospects were, and compare how much over/under similarly positioned young players, but already in MLB.
10:34 AM Dec 28th

tangotiger
There are six players born in 1990 or later, who have substantial MLB playing time: Castro (1912 PA), Altuve (864), Trout (774), Lawrie (707), Bryce (597), Perez (463). After that it's Machado at 202, and Gose at only 189. Myers was born Dec, 1990. So, that's his comp set, not in terms of overall talent as a hitter, but in terms of testing for reliability of our expectations. I think we can make a better forecast for those six guys for 2013-2015 than we could for Myers and five other minor leaguers (with zero MLB experience) of anyone's choosing (whether it's Bill James or a reader who feels really confident).

Of course, we don't have to wait three years to see how it turns out. We can rewind the clock 10 years, go back to 2002, look at players born in 1980 or later, repeat the exercise, then see how all those players did in 2003-2005. Of course, you'd have to select your rookie players without the future knowledge. The MLB players with at least 400 PA in this test: Pujols, Izturis, Felipe Lopez, Austin Kearns.
9:29 AM Dec 28th

tangotiger
jemanji: but I am talking about guys as old as Wil Myers (Castro, Lawrie, etc). I'm not talking about comparing a 21yr old to a 26yr old.
7:14 AM Dec 28th

jemanji
Well, sure, Tango.

You picture a 21-year-old player's forward plot(s) as looking like an old-style phonograph in silhouette - there's a curving upper boundary, and a curving lower boundary, and the first problem is that we don't know how to capture the area within that line, so don't know what a 50th-percentile ROI is.

The second problem is that a 25-year-old player's forward plot(s) captures a much smaller AREA than the projected 'shadow' plot of a 21-year-old -- the phonograph horn of guesstimated future results is just much, much larger than a 25-year-old's.

I think where I tend to agree with Bill ... where I suspect he has a radical point here ... is that the nature of the difficulty with Wil Myers isn't so much that he is a minor leaguer, but simply that he is so early in his career arc. His future 'phonograph horn' is huge, compared to Butler's, just by virtue of the fact that Butler's horn is a single line up through age 26.

On the transition ... next post...
6:41 AM Dec 28th

jemanji
... on the transition thing. We might ask WHY there are real transition problems -- from Tampa to Boston, or Tokyo to Seattle, or Indianapolis to New York. Has anybody asked WHY one Japanese player might hit well in America, and another one might not? Surely it might be related to that player's own ABILITY MATCH against a given player pool?

We all get it - the ML player pool overlaps the minors player pool. I think I read that 85-90% of PCL innings are thrown by pitchers who have recently been in the majors, or who shortly will be. Bill emphasizes this overlap constantly.

Still: isn't there a LEVEL OF PRECISION at the majors that becomes a factor? Suppose Brad Wilkerson can't hit a high pitch, which he can't. Isn't it feasible that the ML player pool delivers a DEGREE OF PRECISION which, in SOME cases, causes a "critical mass" effect for some hitters?

Jeff Clement can't hit a slow inside pitch to save his life. This didn't seem to cost him in the PCL, because pitchers weren't able to deliver that pitch with ENOUGH precision. They'd try, but they'd miss sometimes, and Clement would punish the misses.

Of course it's not true that MLB(TM) plays a different sport than the one played in Japan or the PCL; I've hollered for years that NPB players were undersold. But is it NEVER true that an AAA hitter has a fatal flaw, that imprecise AAA pitchers fail to exploit?
.
6:40 AM Dec 28th

tangotiger
Great!

And in order to test the reliability of the information, you have to match it to out-of-sample data, which in our case is future data (forecasted performance). You take young, non-MLBers Miguel Cabrera, Alexis Rios, Joe Mauer and Jeremy Reed and Justin Morneau and Chris Snelling and Grady Sizemore... take whoever you had originally forecasted to be really really good hitters, and match them with young MLBers who had 400-800 MLB PA and were forecasted to be just as good hitters, and then see how both groups did. If the MLEs were as reliable as the past MLB data, then we'd see the range in performance for the two groups to be similar, if not the first year of the forecast, then at least the second or third year of the forecast.

And my guess is that you'll have much greater spread in performance among the non-MLBers (booms and busts). And this is even though each player was paired with his MLB twin. And the only difference between the two groups is where this information came from (minors only, or mostly majors).
10:48 PM Dec 27th

bjames
Well, I don't agree with that, so I think we have at least isolated the exact point on which we disagree.
9:30 PM Dec 27th

tangotiger
It could be a "transient transition". Say a long-time SS like Jack Wilson going to 2B: he might have a much shorter transition period than say Jimmy Rollins might. But after a year or two, the gap between the two at SS might match that at 2B.

But we're still left that if you have Myers with 400 PA at AA and Lawrie with 400 PA at MLB, the MLB data has to be more reliable, because it has less layers of adjustment required. More adjustments = less reliability.
8:28 PM Dec 27th

bjames
I THINK that Tango believes that the transitional changes are permanent and/or real, whereas I believe that they are temporary and/or illusory.
8:14 PM Dec 27th

tangotiger
jemanji: How about plotting the optimistic/pessimistic career arc (or at least next 3-5 years) of Wil Myers compared to Castro, Lawrie and other quality MLBers of the same age? We're saying that the range for Myers is wider, because his MLE is less reliable than Castro et al MLB numbers. The debate should be about how much less reliable, rather than if it's less reliable, simply because the MLE has more "transition" layers to go through (not only for different quality of competition, but, in some cases, drastically different kinds of parks). After all, not all parks affect all kinds of hitters the same.
7:42 PM Dec 27th

jemanji
Well MW, it is clear that the transition period is totally irrelevant to James' basic point. He was saying that Myers' MLE is (approximately) as reliable in assessing his ML performance as an ML history would be, and if there's a transition period that camouflages this, then that's the reality. If Fenway camouflages Roger Clemens' pitching performance, we don't back down from that.

My issue is plotting the career arc of a 21-year-old. If you even so much as allow for a HI, MID, and LO trajectory forward into ages 24, 25, 26, then obviously a 21-year-old is going to be more of a "gamble" than (say) Billy Butler -- who has HAD his ages 23-26 career arc plotted on the graph.

But then the article above turns around and argues that we are assuming way too much about James Shields' past vs. his future. ... the fact that we have a 30-year-old's previous career arc plotted is a factor not to be minimized.

To take one player projection among 1000's, I remember the 19-year-old Griffey in a James player handbook -- Bill said something like "OK, here's my rule. I think it's fair to project a player like this at one or two plateaus above where he is; Griffey already IS a quality major league player. He will probably become a minor star, and he could become an All-Star. Projecting beyond that, it's too early to talk about it."

Granting that it's 20+ years on, there have been 100's of such assessments of young players: James knew, before anybody else in the world, that Frank Thomas was going to be a cleanup hitter. But did he know whether Thomas was going to be Billy Butler vs. Don Hurst vs Mickey Mantle vs Jimmie Foxx?

..............

For me, this is all mostly quibbling about overstatement. I was much more interested in James' belief that ALL young hitters, 99% or 100% of them, with Myers' career arcs, turn out to be middle-of-the-order hitters (barring injury).

It could be that baseball does not have nearly AS MUCH confidence as it should in 21-year-olds who rake AAA. And if so, why aren't a few teams exploiting this?

- Jeff
.
4:28 PM Dec 27th

bjames
The proposition that predictability and the present assessment of skills are one and the same thing is only true if skills don't change, isn't it? If skill sets change, then the accuracy of predictions is also effected by our ability to project how they will change.
4:19 PM Dec 27th

bjames
OK, Weddell's last post there is helpful. I'm not really interested in seeing proof that there is a transitional uncertainty, because I don't have any real doubt about the issue; therefore, it seems like trying to construct a proof of something that is obvious without the proof. An academic exercise. The article (above) acknowledges that the earlier statements about projecting players were inaccurate/poorly phrased/stupid, so I don't really understand why I am still being beat up for those. But it's a cheating husband syndrome. Saying you're sorry isn't NEARLY enough sometimes. :=)
3:23 PM Dec 27th

MWeddell
I think it still likely that we are talking past each other. In Bill’s 6:01 PM Dec 26th comment below, Bill states that he is “interested in the issue of whether there is clear and convincing evidence of the quality of Myers’ ability.” I don’t see anyone on this thread who disputes that point. I was convinced of that when I read the 1985 Baseball Abstract and nothing I’ve read since then has changed my mind.

What I (and Tangotiger and others perhaps) am questioning is the more extreme statements that Bill sometimes makes. See the quote from Bill in my first comment in this thread. Here’s another one from the Bill’s comments in the earlier article: “Minor league hitters can be projected as major league hitters as accurately as major league hitters can be projected as major league hitters.” I haven’t read any evidence supporting that more extreme claim, and I produced a citation to some data on a blog post that seems to contradict that claim.

Bill seems to now be saying that he’s not interested in projecting what a player will do as a rookie. He admits that “there is a certain perturbation of the data that takes place during the transitional stage” when a batter moves from the minor leagues to the major leagues, but Bill is more interested in projecting beyond that point. Without any evidence that the perturbation is a temporary phenomenon, I’m skeptical. If there’s less predictive value in an MLE than in actual MLB data when we move from year zero to year one, why would the gap close completely when we look at year two or year three? It seems to me that year zero becomes less relevant when we are trying to project year two or year three performance (we have more past data and more recent past data), but whatever noise was in the MLE when when project from year zero to year one is likely to persist when we later project performance for year two or year three.
3:18 PM Dec 27th

tangotiger
A forecasting system (how a player will play in 2013 based on all known or estimated information prior to Apr 1, 2013 season) is exactly the same thing as saying what the player's true talent level is, averaged out, between Apr 1 and Oct 1, 2013.

That Wil Myers information collected from Apr 1 to Sept 1, 2012 strongly suggests what Wil Myers was capable of doing between Apr 1 and Sept 1, 2012 against a certain level of competition is not really the relevant point (that some of us are trying to make).

We're trying to ask if the limited Myers information known as of Oct 1, 2012 is as reliable, or less reliable, than the Starlin Castro and Brett Lawrie information known on the same date. Reliable with respect to what it tells us about how they will perform in 2013 against the highest level of competition.

If you (Bill James) are trying to say that Myers' information is as reliable against his OWN level of competition (minors) as the Castro/Lawrie information is against their OWN level of competition (MLB), then I don't necessarily disagree with you.

1:12 PM Dec 27th

flyingfish
So, reading MWeddell's and Tangotiger's comments, it seems they agree with yours (James's). One possible reason the correlation between a first MLB season and a second MLB season is higher than the correlation between the MLE and the first MLB season is that there was no transition between the two MLB seasons. To make MWeddell's and Tangotiger's points, they have to eliminate the transition as the cause of the lower correlation.

But it seems that in practice, MWeddell, Tangotiger, and the original sportswriter have a point, even if for the wrong reason (and I'm not sure the reason is wrong). They say there's a bigger uncertainty in projecting how a rookie--i.e., a player new to the ML--will perform next year than someone who already is in the ML because there's a transition! If you take the long view, then no, there's no greater uncertainty, but if the concept of discount rates means anything, then the one or two years of uncretainty that the transition causes is an important consideration.
12:49 PM Dec 27th

bjames
Responding to Tiger. .. .I think there is a test like that that would be helpful, but I don't think that is exactly it. It seems unwise to me to entangle the debate about the reliability of minor league hitting stats with the issue of projection systems. Let me think about it for a couple of days and come up with a better response.
11:31 AM Dec 27th

bjames
Regarding Zduriencik's comment, quoted below:

5. There is a raging discussion in Seattle right now. Zduriencik is telling the public that his prospects can't be traded for decent value until they've shown it in the majors. This applies to the Pineda vs Hultzen player pair, and also (he says) applies to the Seager vs Franklin player pair.

I agree with that. It is my opinion that whenever you trade a player who has the ability to perform in the major leagues but has not had the opportunity to perform, you are trading the player at a discount--in my view an unnecessary discount. I think that's correct.

11:19 AM Dec 27th

tangotiger
Bill: ok, this is the test I propose. Find the top 30 hitting prospects (zero MLB experience) over a ten year period, and show his forecast (be it as a rookie or his "peak" forecast).

For each of those hitter, find an MLB player (of similar age and similar forecast, but who has at least 800 PA of prior MLB experience).

How did both groups of players do? Did the two groups:
a. match the overall group forecast?

b. was the spread of the actual performance of the two groups similar?

The contention will be that:
a. if you have a good forecasting system, the two will match (but our expectation is that non-MLB forecasts are way too optimistic, and so, the actual performance will be lower than the forecast for the non-MLBers)

b. regardless as to how good the forecasting system, the spread of the actual performance of the non-MLB group will be wider than the MLB group

9:43 AM Dec 27th

jemanji
From Nov. 12 ...
Hey Bill, are you surprised that a lot of sports teams keep using the load-up-on-aging-free-agents strategy, even though it seems to fail miserably and expensively most of the time? I mean, adding some veteran pieces around a young or prime-age core is one thing, but counting on oldsters to carry the bulk of the load just seems to be an idea with failure built right into it. When you add in the greater cost of signing veteran players, it seems like a doubly bad idea. Any thoughts?
Asked by: OwenH
Answered: 11/12/2012
Well, yes, but. ...organizations that have resources tend to look to proven solutions. "Poor" organizations are willing to gamble on younger players, and become comfortable gambling on improvement from young players. Wealthy organizations tend innately to look for "proven" players.
.........
1. Thanks for the best site in baseball.

2. My name's Jeff, by the way, and I run seattlesportsinsider.com.

3. We all agree that MLE's are very, very important in forecasting a young player's future, allowing for a 60-240 game transition period.

4. The question is whether there is NO "gamble" involved with young players that is not also involved with, let us say, Billy Butler. So in your reply on Nov. 12, did you forget to make it clear that you were merely characterizing the mistaken paradigm of other ML execs, and omitted to mention that this paradigm is fatally wrong?

5. There is a raging discussion in Seattle right now. Zduriencik is telling the public that his prospects can't be traded for decent value until they've shown it in the majors. This applies to the Pineda vs Hultzen player pair, and also (he says) applies to the Seager vs Franklin player pair.

6. I think you could be right about your "no real gamble exists" paradigm -- but if so, it's a rrrrrradical paradigm and you'd think the Red Sox could make a killing with this cutting-edge insight.

Thanks,
Jeff
.
3:11 AM Dec 27th

bjames
I'm afraid that one went right by me.
9:09 PM Dec 26th

tangotiger
Ok, I think we're getting somewhere. The point of using future data is that that is an unbiased estimate of the true talent of a group of players. So, if you take say the 30 best hitting prospects of the last ten years, and see how they did as a rookie, as a group, your estimate and we actually see them do as rookies should be pretty much equal. That is their true talent level (for the group). And you can do the same with 30 equally-talented veterans. But the range in observed performance will be much wider among the 30 rookies than 30 veterans.
9:05 PM Dec 26th

davidharris
What is the evidence for transitional uncertainty? A difficult thing to study scientifically; if a player switches teams, and then his performance was more divergent than it normally is year to year, is the larger than usual change in performance caused by transitional uncertainty, or perhaps because the player had an emerging injury that hastened his departure from the first team? Or maybe correlation when switching teams is lower than when not switching teams because a bad performance often causes a switch in teams, and so you get a regression to the mean effect? (It should go without saying that I am talking about transitional uncertainty above and beyond park effects, by the way.)

The reason I am focusing on "transitional uncertainty" and not minor league to major league correlations is because using transitional uncertainty as an explanation for the reduced correlations in the minors to majors study is only valid if such a thing in fact exists. I've seen a lot of these BS arguments through the years, and know that a null effect really is the norm rather than the exception with sports statistics. It's surprising how few factors really make a difference, particularly among "intangibles." An equivalency is with Psychology, which is what I studied: peoples' personalities can be described by five general traits,when most of us would think Personality was so much more complex.

I should read the minors to majors study... Haven't yet. But I hope there are barometers for degree of similarity other than correlation coefficients in that study. Those cannot be used blindly. They are entirely dependent on the population from which they spring. You expect different correlation coefficients depending on how much variance you have in a subject population. If rookies are more homogeneous in their batting stats and abilities than veterans, they will show lower correlation coefficients from year to year, without actually performing more aberrantly. So one has to look at actual misses in predictons, by percentage of OBP and Slugging Average or what have you, and not just at correlation coefficients. Or better yet, maybe, look at slopes from year 1 to year 2 for each variable....Continuing to brainstorm, if all of the players, veterans and rookies, are in the same dataset, with just a flag variable for their rookie/veteran status, I suppose degree of variation among rookies and veterans isn't an issue, and the correlation coefficient is valid. If the two sets of players are in different datasets, you really can't compare with the r.
8:52 PM Dec 26th

bjames
Right, but this is a study of ROOKIES. I'm not really interested in what Wil Myers will hit AS A ROOKIE; I am interested in the issue of whether there is clear and convincing evidence of the quality of Myers' ability. True, that ability may not manifest itself in 2013, but .. .that's not what I'm interested in. From my standpoint, neither of you has responded to that point, and the study is entirely focused on rookies, thus of limited or marginal relevance to the real issue.
6:01 PM Dec 26th

tangotiger
MWeddell is correct in his point about MGL's study, that the strength of the relationship between successive seasons is stronger for MLB-to-MLB than it is for minors-to-MLB.

Specifically, the correlation (r) that MGL shows for OBP and SLG for the minors-to-MLB is around .38 to .39, while the correlation for those metrics in MLB-to-MLB is .56 to .61. This was even though the number of PA was controlled for (about 450 for minors-to-MLB and 500 for MLB-to-MLB).

This points to a layer of uncertainty based on different levels of competition, and it's quite sizable.

***

The other point we're trying to make is that if you only have say 300 historical PA for one player and you have 3000 historical PA for another player, then our uncertainty level in the forecast for such players will be smaller for the second guy.

Basically, even if you can forecast both players at an OBP of .400, you are really saying something like: .400 +/- .020 as the true talent level for one guy, and .400 +/- .010 as the true talent level for the second guy. And that uncertainty level is directly attributed to how much evidence we have. And the evidence we have is the number of samples (observed plate appearances).

Now, scouting can reduce all that as well. If say we've got fantastic scouting for the guy with little evidence, and for whatever reason, we are unable to well-scout the other guy, we might even give the guy with only 300 historical PA an estimate of true talent of .400 +/- .005.

So, that's all we're saying: the more evidence we have, the smaller the uncertainty level. The more relevant the context, the smaller the uncertainty level.

***

The link that Bill pointed to was:
www.insidethebook.com/ee/index.php/site/article/testing_the_2007_2010_forecasting_systems_official_results/

9:43 AM Dec 26th

StatsGuru
You left out the Jeff Bagwell rookie projection. :-)
8:47 AM Dec 26th

bjjp2
Sorry I put "absurd" in your mouth when the word you used was "silly". My dictionary has them as synonyms though.
--Bruce
8:13 PM Dec 25th

MWeddell
Sorry that my comment to the prior article was not clear.

Let's fill in some background first. Statisticians use a scale from -1 to 1 to measure how well correlated pairs of data are. A correlation coefficient of 1 means there is a complete correlation: once you know the first number in the pair (and have looked at the whole data set to infer the general relationship between the numbers in each pair), then you can determine the second number in the pair. A correlation coefficient of 0 means the relationship between pairs of numbers appears completely random. If the correlation coefficient is between zero and one, then the closer the correlation coefficient is to 1, the more useful the first number is at predicting the second number.

According to www.insidethebook.com/ee/index.php/site/article/minor_to_major_correlations/, mgl looked at offensive performance in the first year in the minors, after converting the figures into park-neutral major league equivalencies (MLEs), compared them to offensive performance the very next year in the majors, and computed correlation coefficients. Mgl then created a second data set of players who spent two consecutive years in the majors: he converted the first year figures into park-neural batting lines, compared them to the offensive performance the very next year, and computed correlation coefficients. For both data sets, mgl required that batters have at least 200 plate appearances in each year.

It turns out that, for several batting statistics including ave, OBP, and SLG, the correlation coefficients were higher in the second set of data. I believe that means that it is easier to predict the second year’s major league performance if a player spent the first year in the majors than it is if a player spent the first year in the minors. This simple study leads me to believe that past minor league performance is not as useful as past major league performance is in predicting future major league performance.

One study of course isn’t conclusive. Maybe Mgl would have gotten different results adjusting for expected aging since players going from the minors to the majors probably are younger on average than players who are in the majors for two consecutive seasons. Maybe his MLE methods or neutralizing park effect methods were flawed. Maybe the sample size was too small. I don’t know; I didn’t try to replicate it. Heck, maybe when I read that the correlation coefficients in the second data set were all higher than the first data set, I misunderstood the implications of that for making predictions.

I am asserting that it is some evidence for believing that it isn’t as easy to predict future major league performance for a batter who spent the previous year in the minors compared to a batter who spent the previous year in the majors. I don’t think Bill is right when he writes a statement such as “We know what Wil Myers is capable of, as a major league player. We know this just as much, in the case of Wil Myers, as we would if Myers had played those 99 games in the majors, rather than in the minors.” Because there is a tighter correlation between two consecutive years of major league performance than between a year of minor league performance followed by a year of major league performance, I believe that Bill’s position is wrong. Minor league batting performance is very useful at predicting future major league performance, but based on the link I posted it does not seem as useful as an equival

6:19 PM Dec 25th

Sea Creatures and Land Animals

COMMENTS (43 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: