Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Stability

By Bill James

October 8, 2014

Before I get into the main part of this article, I wanted to talk for a moment about the practice of establishing statistical parameters for terms that are in general use. I get a little bit of feedback from some of you to the effect that you are uncomfortable with my practice of making up "arbitrary" definitions of terms for the purpose of studying something, and some of you may have questioned whether this is appropriate scientific method. I wanted to say, in that regard: It is absolutely appropriate scientific method, there is nothing questionable about it or debatable about it. It is done in all fields of knowledge, and knowledge could not embrace new areas of study without this being done.

Suppose that you want to study slumps, rank slumps, generalize about slumps. The term "slump" is in wide usage and is generally understood, but it has no precise functional definition. In order to actually study slumps, then, you have to make a practical, objective definition of what is a slump and what is not a slump. In the process of doing that you can make some obvious choices, but you will also have to make some non-obvious choices, some "arbitrary" choices. You have to decide that four losses in five games constitutes a slump (for a team) but that three out of four does not, or something of that nature; there is no avoiding that.

On this site I have made up functional definitions for dozens if not hundreds of terms—when is a team in a slump, when is a run a "manufactured" run, what is a Hall of Fame type season, who is a "Bench" player, etc. I do this pretty much every week, and then I construct ways to measure how deep of a slump it is, or who qualifies as an "ace" of a staff, or what is a "Big" game, whatever. I will create what you might call arbitrary standards to measure these things. I will do that in this article.

But there is nothing about this that is scientifically questionable, in that I have to create what you might choose to call arbitrary standards. It is integral to the process of building knowledge. My life’s work, essentially, has been to build into the baseball conversation as much objective knowledge as it is possible to create. In order to draw something like a slump out of the range of speculation and into the area of knowledge, it is necessary to have a definition of a slump. In order to have a definition of a slump, it is necessary to make some choices that could equally well be made in some other way.

All knowledge is built on these decisions; we merely cease to see these decisions once they become accepted. When I was a studied economics in the 1960s, experts and would-be experts would debate whether or not the economy was in a slump, excuse me, recession. There was, at that time, no generally accepted definition of when the economy was in recession. Now there is; it’s something like "three consecutive quarters of negative economic growth" or something like that. Now, if you say that the economy is in a recession, some "expert" will point out that, "in fact" we are NOT in a recession, because that definition of a recession has come to be so generally accepted that if you use some other definition, you are assumed to be wrong.

If you think about. . .well, a BTU, a British Thermal Unit. What is it? It is an arbitrary definition which has come to be accepted as a standard measurement. What is an "inch" or a "meter"? It is an arbitrary definition of a unit of length, which we have agreed to use.

The process of science involves not merely proposing these definitions, but also agreeing to them, forming a consensus around them. I can’t form a consensus around my own proposals. A few of my proposed measurements and systems have been accepted by others and embraced by them; more of them have been rejected and/or ignored. Some of them will be embraced later, after I’m retired or dead. I don’t have any control over that. But it is not in any sense a questionable scientific practice to do this. It is a vital part of how knowledge is built.

OK, to the topic at hand. I had this question in "Hey, Bill" several weeks ago from someone named Jason:

Hey, Bill. How much turnover is there in the league from year to year? In other words, how many plate appearances and pitcher innings in 2013 were made by guys who played in 2012? Does the rate of turnover tell us anything about the quality of play?

This connected to several other issues that I had been thinking about, and I wound up doing five studies related to the general issue of stability in baseball.

1. Position Player Stability

Bill Freehan had 635 plate appearances in 1968, 555 in 1969. Comparing 1969 to 1968, then, there are 555 "stable plate appearances" that unite the two seasons, accounted for by Bill Freehan—and equal and larger numbers accounted for by Hank Aaron, Luis Aparicio, Glenn Beckert, Johnny Bench, Bert Campaneris, Jose Cardenal, etc., etc.; you get my point. Norm Cash had 458 plate appearances in 1968, 556 in 1969; thus, 458 of his 556 1969 plate appearances count as "stable" or "carryover" plate appearances.

By adding up these individual totals, we get the numerator of the carryover equation, but what is the denominator? We can say that the denominator is the total number of plate appearances in the year, but which year: 1968, or 1969?

To figure "1969" stability, I use the total plate appearances in 1969 as the denominator—the bottom half of the equation—but this is not a perfect answer. When there is a strike, for example, this causes the "carryover percentage" to spike suddenly upward. This number was 78% in 1978, 79% in 1979, and 78% in 1980, but with the strike in 1981 it jumps to 87%. What we ordinarily think of as a destabilizing event—the strike—shows in this approach as a stabilizing event. It makes a less accurate measurement.

To avoid that we make a second measurement, which is to divide the number of "carryover at bats" (technically carryover plate appearances) by the greater of the two numbers: Plate Appearances in 1980, and Plate Appearances in 1981. That creates a lowball estimate of the carryover percentage, so then we take the average of the two. Finally, we make a five-year rolling average of the one-year figures.

The 2012-2013 carryover percentage is 75%, almost exactly three out of four. In modern baseball, three-fourths of plate appearances in each season are accounted for players who had at least an equal number of plate appearances in the previous season. This number has been higher in the past, and it has been lower. Later, we’ll look at how these various measures of stability have changed over time. Right now, I just wanted to explain the process and tell you that the current carryover percentage is 75%. We’ll call that Position Player Stability.

II. Pitcher Stability

For pitchers, we use the same process, except that we use innings pitched rather than plate appearances. Pitcher stability now is 69%, or 91% that of position players. I am surprised that this number is as high as it is. I would have guessed that there was more separation than there is between the turnover rate for pitchers and that of hitters.

Three more observations about pitchers:

1. While I was doing this, because it was easy, I also figured the stability of WINS, wins credited to pitchers, as well as innings. Obviously the stability of wins as opposed to innings would have to be lower, but it’s not all that much lower. The carryover stability of wins is now 61%. In the past, it has been as high as 68% (in 1933, 1934, 1974 and 1976.)

2. There are two concepts here, carryover and turnover. Turnover is the complement of carryover; turnover and carryover add up to one. I will also refer to "carryover" as "stability". Carryover for pitchers now is 91% that of position players (69/75), but turnover for pitchers is 24% higher than for position players (31/25).

3. Here is something I didn’t know, something none of you knew; a new fact for you. Every type of instability in the game causes pitcher carryover to rise relative to position player stability. Historically, pitcher carryover has usually been about what it is now, about 91% of position player carryover, but every time there is any kind of "disruption" in the game, this ALWAYS goes up; there is no exception that I see. Expansion. . .the pitcher carryover goes up relative to the position player carryover. The strikes (1981, 1994, 1995). ..pitcher carryover goes up relative to position player carryover. World War II. . .same thing. The new league in 1900, the Federal League. . .always. Sometimes pitcher stability, in these transitional periods, will temporarily be higher than position player stability.

It makes sense when you think about it, because pitcher usage is more flexible than position player usage. When there is something like an expansion, teams are forced to bring in new position players, but with pitchers, teams can adjust more by giving another chance to pitchers already in the league and by extending their innings a little bit. The guy who goes 11-14 with a 4.20 ERA might be out of the league the next year, creating turnover, but with expansion, he gets to keep his innings. With the Federal League and the folding of the Federal League in 1914-1916, then, pitcher stability spiked to 94-96% of position player stability, in 1943 to 103%, in 1946 to 105%, in 1962 to 98%, in 1982 (after the 1981 strike) to 98%, in 1995 to 97%. Then the relationship always goes back to normal within a few years.

III. Franchise Stability

My third stability study has to do with the stability of franchises, and here I had to make some of those "arbitrary" determinations that I wrote about earlier, in order to measure how Franchise Stability has changed over time.

How do we measure Franchise Stability? Let’s start with the assumption that a franchise is "mature" after they have been operating in the same city for 50 years. In other places I have said 40 years, but 40 years is light; after 40 years a franchise may be said to be mature, but is it absolutely mature? I think there is still a little bit of work to be done for that franchise.

We start, then. . .well, let’s do the Mets. In 1962, the expansion year, the Mets had NO history; we therefore enter the "stability" of that franchise, for that season, as 0 over 50. In 1963 we enter it as 1 over 50, in 1964 as 2 over 50, etc.; in 2012 we finally regard the Mets as a completely stable and mature franchise, with an entry for that team in that season of 50 over 50.

Franchise stability is a fan-based concept. It means that the fan sees the same things that he remembers from his youth, that the teams he remembers being there last year and the year before and the year before are still the teams that are there now.

You could ask why we terminate this "maturity" at 50 years. This is done to prevent what we might call the cartoon statistician’s error. You remember the old line about a statistician: A statistician is a person who, if you have one leg in a fire and the other in a block of ice, will tell you that, on average, you’re comfortable.

The same problem, kind of. If you have one franchise which is 100 years old and another which is brand new, that is NOT the same as having two franchises which are both 50 years old. Two 50-year-old franchises is two mature franchises; a 100-year-old franchise and an expansion team is one mature franchise and one expansion team. We terminate the year counts at 50 so that we recognize this difference.

An expansion team tends to strike the fan as a commercial product, created to try to get his money. An old, established franchise seems to simply be there, to be indigenous to the city where it is. I went to their games with my grandpa; it’s just always been there. That is the meaning of a stable franchise: I went to their games with my grandpa, when I was a kid.

Anyway, the Mets now are entered every year as 50/50. For Colorado in 2014 we enter them as 21/50, for San Diego and the Royals, as 45/50, for Oakland as 46/50, for Arizona as 16/50, etc. So far, so good, but there are unresolved issues.

What do we do about franchises that no longer exist? Washington is entered for 2014 as 9/50, but what do we do about Montreal? The Montreal team had fans; it had a little history. It had an emotional investment. Their fans were hurt when the team was taken away. A team which is no longer there is not NOTHING; it is a marker of instability. What do we do about that?

Here’s what I did. When a team moves away, I enter that, in the first year of their absence, as 0 over 50. Montreal is entered for the 2005 season as 0 over 50, then in 2006 as 0 over 49, in 2007 as 0 over 48, etc. The memory of the team "decays" at a rate of one point per season until it is gone. The Montreal franchise will still count against baseball’s Franchise Stability Index until 2055; it won’t count much against them after a few years, but it’s still there.

I still remember the Kansas City A’s. I still remember how disappointed I was when they left for Oakland in 1967. We enter that franchise as 0 over 50 in 1968, as 0 over 49 in 1969, 0 over 48 in 1970, and as 0 over 4 in 2014. That memory is almost gone, but it isn’t entirely gone.

For a few teams, 50 years isn’t really enough. People still mourn the Brooklyn Dodgers; it has been 57 years. For other teams, 50 years may be more than enough. I used a 50-year decay cycle.

Well, but what about the Seattle Pilots, who existed for only one year? Do they have a 50-year decay cycle, as well?

That would be unrealistic, and there are lots of those franchises in baseball history; baseball history is littered with dozens of "franchises" that lasted only one or two years. Here’s what I did. If a franchise existed for three years or less, then I let their "memory" decay at a rate of two points per year, rather than one point per year. The memory of a team like the Seattle Pilots fades a great deal more quickly than the memory of a team like the St. Louis Browns, who were in St. Louis for decades before moving to Baltimore.

The Federal League was an important event, in baseball history; it disrupted the game for a long time, and it left deep marks on the game. But it lasted only two years (plus one year that it operated as a minor league), so its memory was gone by 1940—whereas without this "short franchise" rule, we would still have been counting the Federal League teams against baseball’s Franchise Stability Index in 1960 or 1963. That seems unrealistic.

One more problem here. What about teams that switch leagues? Houston used to be in the National League; now they’re in the American League. The franchise is an old friend, but the opponents are strangers. That’s instability. What do we do about that?

I charged a team a 10-point penalty when they switched leagues. The Astros franchise started in 1962; they reached maturity (50/50) in 2012. But in 2013, because they moved to the American League, I entered them as 40/50—unless the team had existed for less than ten years before they switched leagues; when that happened I divided their "positive history" number in two, but rounded it up, so that, for example, if the team switched leagues in their 8^th season, the count would go 0, 1, 2, 3, 4, 5, 6, 4, 5, 6.

Nagging questions. . .what if a team changes their name, like the Devil Rays becoming the Rays or the Florida Marlins becoming the Miami Marlins? That’s nothing; I didn’t pay any attention to that. Also, what about the Washington Senators in 1960/61; the Washington Senators existed in1960 and in 1961, but the "old" Senators moved to Minnesota in 1961, becoming the Minnesota Twins.

At the time this happened, the American League announced that the expansion Senators would inherit all of the franchise records, logos, etc. of the original Senators. Modern statistical guides have generally ignored this edict, and have usually presented the expansion Senators as a "new" team, and have sometimes credited the franchise records of the old Washington Senators to the modern Twins, despite the American League’s announcement that it was to be done the other way.

I decided to honor the original rule, and to treat the expansion Senators as a continuation of the original Senators, and the Twins as a new team. This seems to me more reasonable. They’re called the Washington Senators both seasons, they play in the same park, they wear the same uniforms. From the fan’s standpoint, that’s the same team. The fact that it is legally a different entity. . .so what? I’m not a lawyer; I don’t care what the lawyers call it. From the standpoint of a fan, it’s the same team.

So, for 2014, the Franchise Stability Index for major league baseball is 81%, or 1258 over 1555. Sixteen teams are now fully mature, each registered as 50 over 50. The Braves are 48 over 50, Oakland is 46 over 50, San Diego and Kansas City are 45 over 50; those are nearly mature franchises. Texas is 42/50; Houston, because they switched leagues, is 41/50. The 1977 expansion teams, Toronto and Seattle, are now 37/50. That accounts for 24 of the 30 teams.

The Brewers, who also switched leagues, are now at 34/50. Colorado and Florida are 21/50, Arizona and Tampa Bay are 16/50, and Washington is 9 over 50, or 18% mature (as a franchise.)

That adds up to 1258 over 1500, but we still have to account for the franchises that are a part of the fan’s memory, but are no longer there. There are now four of those. The Milwaukee Braves are counted as 0 over 2 (since they moved 48 years ago), and the Kansas City A’s as 0 over 4. The Washington Senators are entered as 0 over 8, and the Expos are entered as 0 over 41. That makes 55 points for dead franchises, so that makes the Franchise Stability Index 1258 over 1555, or 81%.

In general terms, this approach could be used to study franchise stability, and thus the effects of franchise stability, in college sports. The college sports world in the last decade has seen constant and dramatic conference re-alignments, teams hopping from the Big 12 to the SEC, from the ACC to the Big 10, playing in one conference in one sport and a different conference in a different sport, conferences folding and expanding. . .it’s been crazy. It will take generations before it seems normal. An approach like this could be used to create a "Franchise Stability Index" for college sports. We’re at a low point.

IV. Schedule Stability

My fourth stability study had to do with the stability of the schedule. What I am trying to get to here is, is the team playing what appears to the fan to be a stable, dependable schedule?

Every percentage needs a numerator and a denominator. The denominator here is fairly straightforward: it is the number of games played by each team in a full schedule—that is, 162 in modern baseball.

The numerator is more complicated, and relies on decisions that are intended to mimic in numbers what is sensed by the fans—an obviously impossible task at some level, but I used certain rules. If all of the teams actually played out their schedule and have done so every year for years, that number also would be 162, and that would represent 100% schedule stability.

But there are rules, and limitations. The first rule is that the "credit" number—the numerator in the equation—cannot increase by more than 4 per year. In other words, let’s say that there was a strike in 2015, and the average team played only 50 games, then the numerator in 2015 would be 50—but the number in 2016 would be 54, and the number in 2017 would be 58, even if the schedule was being played out normally.

I think this represents the real world. It takes a long time for the memory of a strike to fade away, for the point to be reached at which it doesn’t matter anymore. The more serious the disruption to the schedule, the longer it takes for a sense of normalcy to return to the game.

Also, I charged a "penalty" or a "setback" whenever baseball deliberately alters its schedule—for example, when the leagues split into divisions in 1969, and when the Wild Card was added in 1994, and when the second Wild Card was added in 2011; each of these things (and some others) creates a 20-point setback in the "credit" number of the Schedule Stability Index.

But finally, the Schedule Stability Index for a season is not simply the numerator and the denominator for that season; it is, instead, the average of those figures for the last five seasons. So, with that structure, it takes a long, long time for a disruption of the schedule to disappear from the Schedule Stability Index. The disruption of the 1981 strike was still there—it was still a part of the vital memory of the game—when the 1994/1995 strike happened; we weren’t really close to getting back to 100% by 1994, when the second strike set us back. The Schedule Stability Index was at 100% in 1980, but it has never gotten back there since; we’re still about ten years away from the point at which the memory of those (and other) disruptions has faded from the public. Ten years, if nothing else happens.

V. Run Environment Stability

My final stability measurement is Run Environment Stability. When baseball’s run environment wanders off from historic norms, the games starts to seem like a stranger, rather than an old friend. When the run environment changes rapidly, that’s instability. When the run environment is steady across a long period of time, that’s stability.

We form the Run Environment Stability Index essentially by asking three questions:

1) Is the Run Environment now consistent with what it has been over the last ten years?

2) Is the Run Environment now consistent with what it has been over the last 50 years?

3) Is the Run Environment now consistent with the norm over all of baseball history?

The norm in all of baseball history is 4.61 runs per team per game; that is, if you figure the average for each season and then average all the seasons, you get 4.61 runs per game; you get a slightly different number (4.54) if you figure all the games, rather than all the seasons. If the run environment in a season is exactly 4.61, as it was in 2009, for example, then that’s 100% stability on that measurement in that season.

If it differs from this number by 1.50 runs, then that would be zero stability, since that number (6.11 runs per game or 3.11) would be wildly out of range with historic norms. It’s actually 4.08 in 2014 (as of September 19), so that’s 53 points off, so we enter that as 97/150, or 64.7% stability on that indicator, the "all of baseball history" indicator.

Over the last ten years the average is 4.51, so we’re 43 points off on that, so we enter that as 107/150, or 71% stability versus the last ten years. Over the last 50 years the average is 4.38, so we’re 30 points away from the 50-year average, so we enter that as 120/150, or 80% stability on the 50-year indicator.

Then we combine these three estimates into one; 64.7%, 71.3% and 80% average out to 72%. Then we form a rolling five-year average of those figures, since it would make no sense at all to say that the run environment was unstable in 1951 but stable in 1952, and that rolling five-year average is the Run Environment Stability Index for that season. For 2014 that figure is 81%.

VI. The Missing Sixth

There is a sixth "stability index" that we could envision but which I have not created, which would be the Rules and Play Stability index. I can see how this could be created; it’s just work, and I haven’t had time to do the work.

When the rules don’t change for a long period of time, then the rules can be said to be stable; when the rules do change, then they can be said to be unstable. Suppose that you have a 100-point scale, and you evaluate every rules change as major, minor or moderate. When there’s a major rules change, you whack the index by 30 points; a moderate change, 15 points, a minor change, 5 points. Then you move back toward normalcy at a rate of perhaps 1 point per year or something like that; if there is a 35-year-period without any significant rules changes, then you can get to 100% rules stability, whereas if there is an accumulation of rules changes over a period of years, then you can get pushed further and further away from having a stable rules environment.

Of course, the DH rule would have to be evaluated as a major change. The instant replay/appeals system would be either minor or moderate, I think, and the attempt to fix the collisions at home plate would clearly be minor. Requiring players to wear batting helmets might have been a moderate rules change; requiring the helmet to have an ear flap would clearly be minor. Changing the strike zone in 1963 and changing the height of the mound in 1969 would be moderate changes, not insignificant but not huge.

But there is this about the DH: that having the rule be different in the two leagues represents a kind of "permanent instability" that can’t be wiped out with the passage of time. As long as one league is using one rule and the other league the other rule, then you can’t get back to 100% stability; there has to be some proviso in the system to acknowledge that there is a permanent instability there.

In my opinion, changes in how the game is played should be accounted for in the same system as rules changes, because I think the effects of the two are interactive. If there is artificial turf in one decade and no artificial turf the next, that’s instability. If there are players stealing 100 bases a year in one decade and you can lead the league with 30 steals ten years later, that’s instability. If there are four-man pitching rotations in one decade and five-man the next, that’s instability. If there are 60% complete games in one decade and 20% the next, that’s instability. All rapid changes in the game are perceived by the fan as instability.

As to why these should be accounted for along with changes in the rules, let me explain it this way. In the American League in 1913 there were 127 relievers used per 100 games played. In 2013 there were 577.

This has nothing to do with the rules changing—but it is a very dramatic alteration in the game. The only relevant rules change, the DH rule, actually reduced the number of pitching changes, but the number has gone up more than fourfold anyway. The game has changed because the rules have allowed the game to change in this respect.

It would have been far better, in my opinion, to have put in place a set of rules in 1913 that would have limited how many pitchers a team could use in a game. Since this wasn’t done in 1913 (or 1933, or 1953, or 1973, or 1993), it would be a good thing to put in those rules now. The purpose of that rules change isn’t to change the game; it is to prevent the game from changing into something we no longer recognize—which is where we are going. In 30 years, we’re going to be using six pitchers per team per game, if we don’t step up to the plate and prohibit that from happening.

If the games are played in an average of two hours in one decade and three hours in another, that’s instability. Any changes in the game that people complain about represent some significant form of instability. It would not be tremendously difficult to document, chart and measure these changes, but it would be work, and I haven’t done the work.

A seventh measure of stability/instability could be economic stability. People complain about the players hopping from team to team; we could measure that. There are many other types of stability/instability that we could study, but there are these five that I have studied. In the next seven sections of this article, we will look at how these stability indexes have changed over baseball history.

VII. The Early Years (1876 to 1899)

The National League was formed from the wreckage of the National Association in 1876. The first quarter-century of "major league" baseball, after that, was marked by the frequent formation and collapse of new leagues, and by the constant creation and elimination of new teams.

The National League started with eight teams, but the two biggest cities in the league (New York and Philadelphia) were kicked out of the league after one season, mostly because the league president was a Chicago loyalist who wanted the league to be dominated by the Chicago team. The teams put into the league to replace New York and Philly were weak, and after another year the league shrunk to six teams.

But the teams made a profit; despite their struggles they made money. New teams came and went almost every year. In 1882, because baseball was making money, there was a new league, the American Association, which grew to include some very large number of teams, and in 1884, a third league, the Union Association.

The Union Association was a complete joke; there was nothing about it that in any way resembled a major league, and it folded after one season. The Union Association, however, can be used to argue that we were wrong to let the memory of no-longer-functioning teams decay to nothing after 25 years if they were only around for a couple of years. The reason that the Union Association is included in Encyclopedias as a major league today, when there were better leagues which are not included, is that a young boy who was a fan of the St. Louis team in the UA, Ernest Lanigan, would compile the first effort at a baseball encyclopedia. Because he loved that team—even though it had been more than 25 years—Lanigan included that league as a major league. Every other Encyclopedia since then has followed him in this (ridiculous) listing, and the UA is referenced by everyone now as a major league. So the memory of a team that functions for only one season doesn’t always decay to nothing after 25 years; Lanigan proves this.

Teams continued to spring up and fold throughout the 1880s, but the 1880s were a good decade for baseball, a decade of growth and development. But the 1880s were also a decade, all across America, or very active and often violent labor strife. Labor unions were being organized and violently resisted in mining, factories, in the railroad industry—and in baseball. In 1890 the baseball players union revolted and formed their own new league, the Player’s League. That was a disaster, and the league folded after one season. The two surviving leagues made an arrangement to continue, and after another season (after 1891) they merged into one twelve-team league. The twelve-team league operated through the rest of the 1890s. After the 1899 season four of those teams were eliminated, taking "major league" baseball in 1900 back to one eight-team National League—exactly where it had started 24 years earlier.

Now, our stability indexes. Teams were acquiring history at a rate of 2% per year, so if the eight teams which had formed the National League in 1876 were still the teams operating in 1900, the Franchise Stability Index by 1900 would have been close to 50%. Because this was anything but true, the Franchise Stability Index went upward only very slowly throughout this period. It reached 2% in 1877, and was still at 2% (or back at 2%) in 1884. It increased to 4% by 1889, then back to 3% with the Player’s League fiasco, then gradually increasing. By 1899 the Franchise Stability Index was up to 8%.

By 1899, however, baseball had established a core of professional players who were familiar to the fans. The Position Player Stability Index, which started at zero, had increased to 56% by 1881, meaning that the players who were there in 1881 were mostly the same players who had played in 1880. When there was a new league in 1882, of course, that meant a whole new crop of players and the numbers went down for a couple of years, but by 1889 the Position Player Stability Index was up to 73%. …very close then, 125 years ago, to what it is now. The Position Player Stability Index was almost fully mature by 1889.

The Pitcher Stability Index didn’t mature quite as rapidly, but by 1889 it was 56%, not a tremendous distance from where it is now. These numbers went down, of course, with the Player’s League and the folding of the American Association, but then progress resumed. By 1899 the numbers were 70% (position players) and 64% (pitchers).

The 1880s, as I said, were a good decade for baseball, but there were gigantic rules changes on a regular basis—as in, changing the number of balls required for a walk. These changes caused run scoring levels to rise and fall. The Run Environment Stability Index reached 43% in 1885, went down, and was back at 43% in 1892.

Then, of course. …most of you know this story, so I’m sorry. ..but then, of course, they moved the pitcher’s mound back from 45 feet to 60-feet-6. For a couple of years the league had more .400 hitters than sportswriters. In 1894 Hugh Duffy hit .440 or something like that; Billy Hamilton hit .403 and scored almost 200 runs in 132 games. Philadelphia had four outfielders who hit .400—Ed Delahanty (.404), Sam Thompson (.415), Billy Hamilton (.403) and Tuck Turner, a fourth outfielder who hit .418 in 82 games, scoring 95 runs.

With these extraordinary numbers of runs scored, of course, the Run Environment Stability Index dropped sharply, from 43% to 26%. By 1899 it had recovered to 42%.

That leaves the Schedule Stability. In 1876 National League teams played 65 games each, on average; actually they played more than that, but they played a lot of "exhibition" games against non-league opponents. In 1876 the point had not been reached at which fans took the "league" competition seriously enough that the league games drew better attendance than the "exhibition" games, or else the teams did not realize that this point had been reached. Over the first decade they gradually replaced exhibition games with league contests. By 1888 the teams were playing an average of 137 games apiece; by 1892, 154.

Schedule stability was increasing gradually over this period, but only gradually, and "schedule stability" requires a memory. Schedule stability requires that teams play the schedule that people remember them playing years ago—and then a funny thing happened. In 1892 the National League decided to play a split schedule, with a first-half champion and a second-half champion. Our system requires that, when the league deliberately changes the schedule, we charge them a 20-point penalty for that change in the numerator of the schedule stability equation. By 1899 Schedule Stability had only reached 44%.

The 1890s were a God Awful decade for professional baseball. A system of "syndicate ownership", in which the same men owned large shares of stock in multiple teams, resulted in constant efforts to "fix" the pennant race, very much as is often done in Strat-o-Matic or APBA leagues. If a gang of investors owned two teams and one of them was good and the other weak, they would move all of the good players to the better team. This caused the league to dissolve into a collection of extraordinarily good and extraordinarily terrible teams, with very, very poor pennant races. The game on the field also got quite ugly, with constant fights, bad language being used in the presence of the fans, and players intentionally trying to injure other players. Baseball players—like NFL players today—got the reputation of being a bunch of disreputable ruffians.

I once wrote something very much like that, 30 years ago, and I remember an argument or discussion that I got into with a reader who wrote to defend the 1890s league. Yes, he said, there were certain things about that league that were not good, but the "Big League"—the 12 team league of the 1890s—brought baseball stability for the first time. For the first time, teams played an organized schedule without new teams and new leagues constantly forming. This was the term he used, that baseball had "stability" for the first time.

But now that I have a way to actually measure stability in baseball for the first time, I can see how completely wrong he was. If you combine these five stability indexes into one, you can see that 1890s baseball, far from becoming stable for the first time, actually had exceptionally small improvements in stability in that decade, compared to the preceding decade or the ones that followed. The overall index of stability in baseball, combining the five indexes, went from 0% in 1876 to 23% by 1879, to 35% by 1884, and to 43% by 1889. Then it went down and then recovered, but it didn’t get back to 43% until 1898, then it did tick up to 46% in 1899.

The argument that baseball had "stable" franchises for the first time in the 1890s ignores the fact that more than a dozen franchises were eliminated early in the decade. It treats these no-longer-existing franchises as if they counted for nothing—but that’s not right. Those franchises had fans, and the fans were left, like the Montreal fans in 2005, holding an empty bag. There was a real need for those other teams, a fact which would be demonstrated in the first decade of the 20^th century, when replacements for those teams arose and formed a new league, the American League. When you include the memory of those murdered teams, franchise stability grew only very slightly in the 1890s, while Run Environment Stability went sharply backward, and Schedule Stability grew only very slowly. Looking at everything, while baseball made tremendous progress toward stability from 1876 to 1889, it made almost no progress from 1890 to 1899.

VIII. 1900 to 1919

In 1900 the National League contracted from 12 teams to 8, by killing off four of the weak teams which were owned by syndicates whose owners also controlled strong teams. In 1901 the American League was formed (technically, the American League in 1901 became a major league. It had previously existed as a minor league.)

These things represented a setback to the Franchise Stability Index, which dropped from 8% to 6% (1901). After 1901, however, the Franchise Stability Index began to make regular progress for the first time. The sixteen existing teams began to develop histories stretching back toward the memories of childhood, while the memories of the no-longer-existing teams were gradually fading. The Franchise Stability Index increased to 10% by 1905, to 18% by 1910, and to 24% by 1913. By 1913 many of the teams had something resembling a history for the first time.

In 1914 there was an upstart league, the Federal League; that folded after two seasons. That set the Franchise Stability Index back to 20%, but progress recovered then, and by 1919 that number was up to 28.

But while Franchise Stability was gaining traction, the Run Environment was fleeing from it. The 1890s had been—and still are today—the biggest-hitting decade in baseball history. Because of a few rules changes and who knows what, this was followed immediately by the worst-hitting decade in baseball history. Run Environment Stability, at 48% in 1901, had dropped to 20% by 1908, the lowest it had been since the 1870s.

After 1908 the Run Environment Stability Index began to move upward, in part because run scoring increased somewhat, and in part because the new reality began to seem normal after a few years. After the low of 20% in 1908 the Run Environment Stability increased to 35% by 1911, and to 57% by 1915. It faded in the late teens as another run drought enveloped the game.

In both of these areas, then, stability in baseball was far higher at the end of this era than it had been at the beginning—and Schedule Stability was tremendous. Baseball had returned to a 154-game schedule by 1904, and stayed with that until the war shortened the 1918 season. The Schedule Stability Index, at 44% in 1900, was up to 91% by 1917. The new leagues also set back the Player Stability and Pitcher Stability Indexes, of course, and the Player Stability Index more than the Pitcher, as is always the case. The Player Stability Index, at 70% in 1899-1900, dropped to 62% by 1902, recovered but remained low throughout this era, for reasons that I do not understand. Throughout most of baseball history the Player Stability Index has been in the low 70s, the Pitcher Index in the mid 60s, but in these two decades both sets of numbers generally hung in the low 60s. I do not know why.

Still, these were two decades of tremendous overall growth in the stability of the game, with all three of the most volatile indicators surging forward. Combining the five indicators into one, baseball’s stability index was 47% in 1900, and, because of the tremendous change in the run environment, this had increased to only 48% by 1908. By 1917, however, it had shot up to a record-high 59%, before the side effects of World War I pushed it down to 57.

IX. 1920 to 1939

Between 1920 and 1939 all of baseball’s stability indexes shot up. There was no new franchise in that era; there was no discontinued franchise. There were no franchise moves. Franchise stability went only up, and by 1939 had reached 81%.

Let’s think about what that means. 100% franchise stability would mean that every team in baseball had been operating in its current location for at least 50 years, and that no team had been discontinued or moved out of town in those 50 years. We weren’t there by 1939, but 81% is roughly equivalent to 40 years per team. Teams were developing long histories. For the first time, there were young men who could say that Grandpa was a Cardinals fan when he was a boy, just like I am today.

Schedule stability in this era was perfect. 154 games was the schedule in 1920; it was the schedule in 1939. The schedule stability index reached 100% in 1929, and it would stay there until 1960.

The Run Environment Stability Index, as well, improved markedly. I assume that most of you know that the Lively Ball era began in 1920, with the banning of the spitball and related pitches. But the new, higher run levels of the early 1920s were actually much more in keeping with most of baseball history than were the dead ball levels. The overall norm for baseball history is 4.61 runs per game. From 1920 to 1926 the averages were 4.36, 4.86, 4.87, 4.82, 4.76, 5.13 and 4.64.

The Run Environment Stability Index, which had topped 50% for the first time in baseball history in 1914, reached 63% in 1922, 75% in 1924, and was at 83% by 1928.

In 1929 and 1930 there was an offensive explosion, of course, and this did drop that stability index down to 76% by 1933. After that, however, the numbers went back up until the start of World War II, reaching 91% in 1941.

In all three of those areas, then, this was an era of absolutely tremendous stability. The player and pitcher stability indexes are less volatile than the other three, but those numbers, too, went gradually upward with the game’s increases in stability. The Position Player Stability Index reached 74% in 1924, and stayed in the 70s through the rest of this era, although it never again got as high as 74%. The pitcher stability index peaked at 72% in 1933, a record which would not be beaten until 1974, and hung around 70% for most of this era.

Combining the five into one, then, baseball’s overall stability index was 58% in 1920, 69% in 1925, 73% in 1930, 80% in 1935, and 82% in 1940.

X. 1940 to 1959

World War II (1941-1945) was a very significant destabilizing event in baseball history. Position Player Stability dropped from 71% in 1941 down to 61% by 1945, then dropped again, to 55%, when the pre-war stars (like DiMaggio, Ted Williams, Bob Feller and Stan Musial) returned to baseball in 1946. The pitcher stability index dropped from 68% in 1941 to 57% by 1946.

Run Environment Stability took a hit, as well, as the rubber used to coat the cork center of a baseball became unavailable due to the war effort. The "balata ball", used during the war, was dead; it was sort of like hitting an old sock. Run scoring levels dropped, and the Run Environment Stability Index dropped from 90% in 1941 to 66% by 1946. After 1946 it began to recover. The schedule remained perfectly constant, 100% stable, throughout the war, and there were no franchise moves until the 1950s. But baseball’s overall stability dropped from 83% in 1941 to 74% by 1946.

Now, let’s turn to the question that Jason posed for us: Does the rate of turnover tell us anything about the quality of play?

Well, not really, or anyway not too much. When the pre-war stars were drafted into the Army or went into the Marines during World War I, the Player Stability Index went down, and the quality of play went down. But when those same players returned from the war in 1946, the Player Stability Index went down again, even though the quality of play almost certainly improved.

Look, I have no doubt that stability is generally a partner of excellence. A stable economic environment is necessary for a business to thrive, most of the time; a stable environment is best for raising a child. A stable environment is a precursor to success in almost every field.

But there is such a thing as too much stability. Too much stability is stagnation. Suppose that baseball’s roster rules were changed so that the Player Stability Index went to 90%. It would be entirely possible to do this, if the rules were changed to make it more difficult to change your roster.

It could be done—but would that mean that the quality of play was better? Of course it would not. What it would mean is that good young players were being kept out of the league to protect the roster positions of over-the-hill veterans.

I believe that the same applies to franchises, that there is such a thing as too much franchise stability. If franchise stability goes to 90%, what that means is that new and exciting young cities, like San Antonio and Austin, are being locked out, while older cities are being protected for no better reason than that they have long been a part of the magic circle. We’re at 81% now, and 81% is OK—but if franchise stability goes over 90%, I would be worried about it.

After World War II, after 1946, the overall stability of baseball began to increase again. Schedule stability, at 100% since 1929, stayed at 100% throughout this era. Player and pitcher stability returned to historic norms. Run Environment Stability, down to 66% by 1946, returned to 87% by 1951.

Most remarkable, though, was that the Franchise Stability Index reached 100%. Actually, it didn’t reach a "pure" 100%; it reached 99.6% in 1951, and 99.9% in 1952. We had almost reached the point at which we could say that there had been no franchise disruptions in the living memory of the baseball public. We had come astonishingly close to 50 years without a change. Baseball’s overall stability index in 1952 had reached 85%.

And then all hell broke loose. We have never gotten back to 85% since then, and it is not likely that we will get there any time soon.

Franchises started moving. The St. Louis Browns moved to Baltimore. The Boston Braves moved to Milwaukee, the Philadelphia A’s to Kansas City, the Dodgers to Los Angeles and the Giants to San Francisco. The Franchise Stability Index dropped from 100% in 1952 to 54% in 1958.

Run Environment Stability reached a record-tying 90% in 1955; Player and Pitcher Stability stayed in their normal ranges. But baseball’s overall stability, after a peak of 85% in 1952, had dropped to 78% by 1959—and it would continue to drop.

XI. 1960 to 1979

Baseball expanded for the first time in 1961, adding two teams in 1961. Expansion, of course, is a destabilizing event. We added two more teams in 1962, four in 1969, and two more in 1977. By 1969 the majors were 50% larger than they had been just eight years earlier, and had been for decades before that. Franchises continued to move. The Braves moved from Milwaukee to Atlanta. The A’s moved from Kansas City to Oakland. The Pilots, after one season in Seattle, moved to Milwaukee. The Senators moved to Texas. It was a game of Musical Teams. By 1972 there were 24 teams in baseball, most of which had very short histories in their current cities. The eight expansion teams obviously had no long history, but neither did the A’s in Oakland, the Rangers in Texas, the Braves in Atlanta, the Dodgers or Giants on the West Coast; even the Orioles in Baltimore had only been around 20 years, and their attendance was modest.

Baseball, which had used a 154-game schedule since 1920, now switched to 162—and that schedule was cut by a strike, for the first time in decades, in 1972.

The run environment went haywire, runs dropping to the lowest levels in 60 years. All elements of the stability index, in short, were going to hell in a handbasket. Franchise stability, at 100% in 1952, dropped to 39% by the time the franchises stopped moving in 1972. Schedule stability, at 100% for decades, dropped to 85% in 1962 due to the 162-game schedule, recovered quickly to 100%, but then dropped again to 88% in 1969, due to the split into divisions, and to 83% in 1972, due to the strike.

Run Environment Stability reached a record high of 91% in 1962, then dropped to 82% by 1964, 74% by 1966, and to 60% by 1968, when the pitchers ruled the game and only one American League player hit .300. Even the player and pitcher stability indexes were pushed down slightly by the expansions.

There is probably such a thing as too much stability, and perhaps we had seen that in the post-war era, but there is certainly such a thing as too much instability. By 1972 baseball had way, way too much instability. Attendance was stagnant; per-game attendance was going down steadily throughout this era. The NFL began to pass baseball in popularity surveys in 1964.

When a franchise moved, there would normally be a burst of attendance in the new city, a novelty effect. But after three or four years, in many of these cases attendance was very poor. The new cities had no history. People had not grown up going to the games; there was not a long history there. Grandpa never rooted for this team; Grandma never heard of them.

I notice that in general, in many different areas, people commonly confuse novelty effects with long-term benefits. Television seems to manage itself in the perpetual pursuit of novelty effect boosts, shunning entirely the long-term development of loyal audiences. Fast-food restaurants try to compete with McDonalds by coming up with slightly absurd culinary concoctions, putting hash browns and chocolate in a gyro or something, in the effort to attract a novelty-seeking audience. Perhaps this is off the subject, but I notice this every day, that people seem to be making fools of themselves pursuing novelty effects.

From 1952 to 1972, baseball was that pathetic guy who was in love with the floozie, chasing that novelty effect rush. The overall stability of the game dropped from 78% in 1960 to 73% in 1964, 71% in 1968, and 67% in 1972. This was the lowest it had been since 1923. 90% is probably too high; 67% is certainly too low.

Although we haven’t documented it in this package, the Designated Hitter Rule, adopted in 1973, further destabilized the game, and the Messersmith ruling creating free agency in 1975 kicked the concept of stability in the game on down the stairs. But after 1975, the stability of the game gradually began to recover. After 1972 (the Rangers moving to Texas) there were no more franchise moves for more than three decades. After the additions of 8 expansion teams in 8 years in the 1960s, there have been only 6 more expansion teams added in the next 45 years. The schedule stayed at 162 games for the rest of the 1970s.

Both pitcher and player stability reached all-time highs in this era, the position players peaking at 78% in 1980, the pitchers at 73% in 1974 and 1976. (I believe that the small increase in player/pitcher stability in this era were an after-effect of the expansions of the 1960s. Baseball expanded so rapidly in the 1960s that its normal ability to replenish the stock of players was somewhat strained. In other words, I believe that these peaks in player/pitcher stability probably resulted from weakness, rather than strength.)

But baseball was recovering some stability after 1972. The Run Environment returned more or less to normal. After the low of 67% in 1972, baseball’s overall stability had recovered to 75% by 1979, and to 76% by 1980.

XII. 1980 to 1999

And then there was the strike.

Baseball’s first major strike, in 1981, was a tremendously destabilizing event. The memory of it lingered around the game for years; it lingers there yet—and here is what people forget. Accompanied by the strike, the powers that be in baseball decided to split the schedule, treating the second half of the season, after the resumption of games in August, as a separate competition from the first half. In my system, that’s a 20-point penalty for deliberately changing the schedule.

The Schedule Stability Index, which had been at or near 100% for decades, dropped to 54% in 1981. Although it recovered slowly and gradually after that, it was only at 83% in 1993, and then there was an even worse strike, the 1994-1995 strike. They fooled with the schedule again, splitting into three divisions and adding a Wild Card to the playoffs; that’s another 20-point penalty. Schedule Stability dropped to 46%, and has never yet gotten back higher than 88%.

Franchise Stability continued to recover in the 1980s and 1990s, but that’s a long process, building team histories. The Run Environment was normal in the 1980s, but then the steroid era began. Run Environment Stability in the Sammy Sosa-Mark McGwire era dropped to 67%, the lowest it had been since 1968. And the strike was a bomb; the strikes were massive de-stabilizers. The expansions of the 1990s were small de-stabilizers. The steroid era hitting numbers were de-stabilizers. The overall stability of the game, which had been 76% in 1980, didn’t get back to that level until 1993, and ended this era (1999) at 67%, the same sorry point we had been at in 1972.

XIII. The Twenty-First Century

Since 2000—actually since 1996—baseball has been steadily reclaiming its historic stability.

Franchise Stability, which reached a low point of 39% in 1972, is now back to 81%, the highest that it has been since 1953. 81% is a very, very high number for franchise stability—not the 100% of 1952, but very, very high. We are in the second era of great franchise stability.

Run Environment Stability returned to normal ranges after the steroid era ended, reaching a peak of 88% in 2009. Since 2009 we have drifted away from the historically normal range of around 4.50 runs per game. We appear to be entering a new pitcher’s age. 2014 Run Environment Stability is down to 81%, and this number appears to be dropping every year.

Schedule Stability is gradually recovering. It reached 88% in 2010, the highest it had been since 1980, but then was set back by the addition of a second Wild Card team to each league, a schedule interruption. We are now at 83%. The memory of the 1994-1995 strike is not gone, but it is beginning to go away.

Position Player Stability is at 75%, which is higher than it has been throughout most of baseball history.

Pitcher Stability is at 69%, which is higher than it has been throughout most of baseball history.

Baseball’s "Era of Instability", which really began with the move of the Giants and Dodgers in 1958, can be said to have ended with the banning of steroids in 2005. The overall stability of the game reached 79% in 2011, the highest it had been since 1957. It has faded slightly since then, mostly due to the deterioration of the Run Environment. But we are clearly in an era of relatively high stability, for the first time in many years.

COMMENTS (29 Comments, most recent shown first)

MidnighttheCat
Bill James: " But there is nothing about this that is scientifically questionable, in that I have to create what you might choose to call arbitrary standards. It is integral to the process of building knowledge. My life’s work, essentially, has been to build into the baseball conversation as much objective knowledge as it is possible to create. In order to draw something like a slump out of the range of speculation and into the area of knowledge, it is necessary to have a definition of a slump. In order to have a definition of a slump, it is necessary to make some choices that could equally well be made in some other way."

Again, quite right. This is called "operationalizing" ( don't love the term, but that is what we social scientists call it). It is a necessary part of any serious research. Essentially it is the creation of a working definition - something that allows you to study something and for an author or analyst and their readers to at least for purposes of the study, agree on the terms they are using.

Is India more democratic than South Africa? What counts as "democratic" - free elections, number of participating parties in contested elections, or only the ones that have a chance to win? Voter participation (turnout)? The level of open debate in media (newspapers, TV news etc.)? All or only some of the above?

What leads to economic development? Well, what is going to COUNT as development - only growth of GDP? Or the diversification of the economy into more sectors? Percentage of the population above certain poverty levels (and do we define poverty using the World Bank standard of $2 a day, or each country's official government poverty line?), should we count GDP using exchange rates or Purchasing Power Parity (PPP)? What about literacy rates? (very high in China and South Korea, abysmally low in much of rural India)? Percentage of the population that lives in urban areas? That has running water and electricity?

The point is that anyone trying to make any argument using data and giving it meaning must decide which are the relevant metrics, let their public or readers know what these are and then show the results. This allow critiques on the basis of 1) a common set of measurements and 2) the debate over whether these chosen metrics are the best to use. Each of these kind of debate is worth having and can advance science in any field.
10:24 AM Mar 8th

MidnighttheCat
Bill James: "I get a little bit of feedback from some of you to the effect that you are uncomfortable with my practice of making up "arbitrary" definitions of terms for the purpose of studying something, and some of you may have questioned whether this is appropriate scientific method. I wanted to say, in that regard: It is absolutely appropriate scientific method, there is nothing questionable about it or debatable about it. It is done in all fields of knowledge, and knowledge could not embrace new areas of study without this being done."

Indeed. In Social Science we call this method "Ideal types" and it is associated with the work of Max Weber, but is widely used even by many who do not realize they are using it.

It is not equally appropriate for dealing with all kinds of historical, social or political questions that one wants to study, and for some I would not advocate its use. But there is nothing inherently wrong or unscientific about it, so long as certain steps are taken and one always keeps in mind the difference between one's categories and the real world we are trying to describe and analyze.
10:16 AM Mar 8th

tangotiger
The Marlins dismantled following their 1997 World Series. They went from 2.4 to 1.7 to 1.4 million fans from 1997 to 99. That would add instability to it. The Expos 1995 Fire Sale as well.

But then you'd have to distinguish those from what the Redsox/Dodgers did.

I guess you can get there by looking at change in W/L records, and there's more "instability" when the W/L record changes fast year over year.
9:52 AM Oct 17th

robneyer
When Bill writes that the Senators were, for the fans, essentially the same team, he's essentially seconding Jerry Seinfeld's famous bit about fans merely rooting for laundry. Which I think is true, but only to a point. You turn over the entire roster in the off-season, and none of the fans are going to really care? I find that hard to believe. Granted, this is just one case and it's strange to make a special rule for just one case. But I can't buy the notion that turning over the entire roster isn't destabilizing in some meaningful way.
7:17 PM Oct 13th

astilley
Fascinating. I felt the data was calling out for a line graph, so I put one together that you can see here:
kcbbh.blogspot.com/2014/10/bill-james-stability-index-graphed.html
Bill, if you care to provide access to the exact numbers for each year, I'd be happy to update the graph.
12:56 PM Oct 12th

lporschen
Arctic mike - thank you. That helped.
6:31 AM Oct 12th

MWeddell
It's a minor point in the context of the whole article, but it seems to me that fans of the Washington Senators experienced more instability when the entire team changed to obviously inferior players than fans of the Brewers and Astros experienced when the teams remained constant but most of their opponents changed.
2:45 PM Oct 11th

articmike
I will take a crack at the question from lporschen. As it is obvious that I neither am Bill James nor the author of this article, I suppose that means he ought to take the following with a grain of salt.

Additional AB's for given players compared to last year's season is a sign of growth, not stability. Generally, when a hypothetical 23-year old player turns 24 and gets an extra 200 AB's, that's the game getting better, usually with the retirement of an old player who couldn't cut it any more, or of a marginal younger player who couldn't cut it. Under stability, nothing changes, and there is no growth, nor improvement.

If I may attempt an analogy, adding new teams is growth and is instability. A superstar who comes back from a bone chip and plays 159 games rather than 120 games the year before is instability, growth, potential game improvement and additional excitement. Lots of stability siphons some of the excitement out of the sport. Expansion does not lead to immediate improvement of the game, but, without the expansions of yesteryear, dynamic regions and cities such as Houston, Phoenix and San Diego would not be served by baseball, and baseball would be the worse for it.

The indices in the article have been fashioned so that the new and moved franchises do not show up merely as a 3% (31st club) or 6% (17th club) increase in AB's, or 3% or 6% differential in locations of games, for a single year, then becoming, in year 2, a sign of 100% stability in game locations. Stability slowly increases, with these calculations, as the new team puts down roots, in the index. I find the instability generated from increases and decreases in individual athlete playing time to be consistent with the definitions of the other indices that make up the system.

Baseball grew with expansion, baseball was destabilized by expansion, baseball got worse with expansion (although, generally, temporarily, with better attendance, which was good for the industry) and after some years re-stabilized at a higher-then-pre-expansion level of playing quality and serving a multi-regional live audience.

Adding new populations is de-stabilization, but it's growth. The game improves from having African-American, Asian, Asian-American and Latin American ballplayers, and the introduction of each new population is de-stabilization. Once the population within MLB of a new demographic, that started with just a sprinkling of players, reaches a more reasonable mass, baseball begins seeing long careers of a cadre of players who could not previously have been signed. When the population is first diversified, star players from a new sub-population, in mid-career, join MLB. Later, you have players with 15-year MLB careers, who lack significant pre-MLB high-level baseball experience, who start showing up on the chart as a source of instability, as they graduate from mid-season call-ups to front-line players. During their 5, 7 or 10 years as mainstays of their organizations, they form a stable source of AB's and IP's that show up as the improved game returning to stability.

If all of the foregoing fails to make one believe that 91% stability (for example) means bad baseball rather than better baseball, then that probably means the foregoing either was wrongheaded or poorly written. In the opposite case, you would probably see that an atrophied baseball with little change would lose fans. More 38-year-old players would keep their front-line jobs, keep young talent in the minors and keep players who got 150 AB their rookie years as injury-replacement hurry-up call-ups keep on getting 150 AB/yr. at 25, 26 & 27.

When those young players and stars in their primes, rather, take up a bigger share of all of a year's baseball action, sprinkling instability into a stable game, the indices are doing their job of providing an approximation of change in the game; as long as you accept the idea that 100% stability is ennui or atrophy or ossification or some other bad thing.

To offer an example, putting Robin Yount's entire 1975 and 1976 seasons into the stability category, noting that 1974 was his rookie year, and saying he created no instability his 2nd and 3rd seasons, would not properly account for the change he was exerting in his little corner of the baseball world -- as he aged from 18 to 20, began playing 161 games in a year rather than 107, and set the stage for a very positive element of stability in baseball 1976-1990.

Philadelphia, Boston and St. Louis still would have two teams each in stable baseball. Minneapolis, Ft. Worth and Atlanta would have no teams. Attendance would be down. TV viewership would be down. There wouldn't be anything terrible about having baseball in Montreal, but, with no fans and a previous-generation stadium with a roof that doesn't work, and a city government full of people who grew up reading about the folly of a public works program undertaken to attract Olympic sports, having baseball in Montreal wouldn't be excellent. It's probably better to have a vibrant team in a new ballpark in Washington.

When Washington loses an MLB team, or when Los Angeles loses an NFL team, that might be bad instability; but some of the other instances of a game's instability refresh the whole game and allow improvements.

If I understand the article, good and improving baseball can be played at approximately 70% or 80% stability, but that 90% stability would be scary. The growth in the at-bats and innings pitched of a cadre of young players and injury-comeback players is part of the necessary refreshment of the game; labeling it as one aspect of instability in this context is ok with me.
10:46 AM Oct 11th

lporschen
Bill,
I'm struggling with the concept of additional plate appearances by a player over the previous year counting as instability.
Would looking at how many players had at least one plate appearance in 2014 as the denominator and how many players had their very first ever plate appearance as the numerator, give you another measure of stability?
2:11 AM Oct 11th

jemanji
One other thing about science: NON-scientists often come to think of "double blind controlled experiments" as science, and everything else as not.

Actually, the "irreducible components" of science are simply --- > *systematic inquiry, especially where tests are applied in an attempt to prove our assumptions false.*

When Columbus explored the world by sailing a ship out to the horizon, that was science. When Semmelweis started washing his hands to treat patients, he used a "points system" (in his head) to simply observe a tendency for nurses with clean hands to give fever to fewer patients.

Bill, at a very fundamental level, is interested in proving his beliefs false … and in that way coming to believe things he can't prove false. For him to be accused of "bad science" is strange.

It's a type of snobbery for us to believe that --- > if it hasn't been peer-reviewed by Accepted Experts, if it isn't in a hackneyed format, then it's coming from the Village Idiot. That happens in biology when disfavored theories are argued, and in happens in sabermetrics when unusual "points systems" are argued, and are not approved by one of the top three or four saber sites.

Yes, it's snobbery, elitism, and it hinders real science. I mean it in a good way.

;- ),
Jeff

12:53 AM Oct 10th

BobGill
Regarding 1901-19 being "two decades of tremendous overall growth in the stability of the game," I think about a dozen of the 16 major league built their first steel-and-concrete parks from about 1909 to 1915, right in the heart of this period. That suggests that maybe for the first time the owners believed they were here to stay, and it now made sense to think about building a permanent home. Sounds like another argument for the game's growing stability.
5:01 PM Oct 9th

BobGill
"They’re called the Washington Senators both seasons, they play in the same park, they wear the same uniforms. From the fan’s standpoint, that’s the same team." I live in the Maryland suburbs. I started following baseball in 1962, so the move to Minnesota didn't matter to me, but my father, who had been following the Senators since 1933, didn't get over it for years. The Senators had been awful for the last six or eight seasons before the move, but in 1960 they perked up a bit, and with players like Killebrew, Pascual, Kaat, Battey, Allison and so on, it looked like they were starting to assemble a good team -- which turned out to be true, of course, but that team came together in Minnesota, not here. I heard about it frequently: "If Griffith wanted to move, why couldn't they give him an expansion team and let Washington keep the real Senators?" In short, my dad NEVER considered the new Senators to be the same as the team that left town, and he couldn't have been the only one who felt that way. Just for the record.

3:26 PM Oct 9th

tangotiger
Just finished reading. Wonderful.

One other source of stability: black players, and foreign-born players.

If I remember right, the number of US-born players in MLB in 1969 with 24 teams (if you count by PA and IP) is the similar as those today with 30 teams. So, we go an influx of 6 teams worth of players, but all are foreign-born.

In some respects, expansion forces the issue, as teams are more tempted to look elsewhere, so we are in effect double-counting.
1:37 PM Oct 9th

tangotiger
"It takes a long time for the memory of a strike to fade away, for the point to be reached at which it doesn’t matter anymore."

While I agree in the basic concept that you are presenting in terms of schedule stability, I don't think that the memory lingers like this for a sports fan. I think it's very much isolated to baseball.

Hockey has had 3 disruptions since AFTER the last baseball disruption (two seasons cut by 40%, and one season lost altogether), and yet fans come back in droves. I don't follow NBA, but I get the sense that it's not really disruptive beyond that season. It doesn't linger. But it does for baseball (and maybe because baseball is an every-day game, and all the other sports often go a few days without a game).

So, to the extent that you may want to apply the reasoning to other sports, I'm not sure that it's going to be handled similar to baseball.
1:24 PM Oct 9th

tangotiger
In other words, I think that at the start of 1994, I'd likely consider the Jays to be more stable than the Expos, even though Montreal had an 8 year head start. And the Mariners to be less stable than the Jays, even though they started at the same time.
1:18 PM Oct 9th

tangotiger
Just finished reading the franchise stability section. Very interesting.

A couple of questions: for a team like the Expos (RIP), rather than decaying them over 50 years, why not decay them over the number of years they were in existence? This way, a team that was around only for a few years will decay very fast over the same amount of time.

I'm surprised with the Braves. Do you think .550+ seasons or a World Series will accelerate its stability? If we look at the 1970s Orioles or the Royals in the 70s/80s, do you think they are more "stable" than other lesser successful (but same aged) teams? The 1992/93 Jays would seem to have given it "stability". Not sure though. Just a thought that I haven't worked through.
1:16 PM Oct 9th

tangotiger
And I'll add that what Bill does is better, because it is transparent. It's easily reproducible, and therefore, easier to test.

Compare that to the 90% of studies I read, that I term "mathematical gyrations". There's no one that loves math and numbers more than I do. And my heart breaks when I see what they do, jumping through hoops to prove something that is layered with so many calculations, that you can't even begin to try to evaluate the methodology.

Now, time to read this article!
12:13 PM Oct 9th

bjames
Right. The fact that it is scientifically legitimate if done right doesn't mean that I'm doing it right.
11:56 AM Oct 9th

tangotiger
I just read the preface (so far) and I concur with Bill. I actually just talked about this yesterday on my blog:

tangotiger.com/index.php/site/comments/game-score-and-crowdsourcing#8

"
One of the things that Bill does in alot of the work he does is give out various “points” for various things.

I always question those, because I never know if there is a bias in there. They all seem reasonable enough, but you can never tell if there’s a bias, until you really dig into it.

So, this is a good example of how most of the Game Score works pretty well, and it’s got a little hick in there.

It is overall impressive that Bill can build these kinds of tools, in what seems arbitrary, but has method to its madness.

We just need to be aware of possible biases.
"

***

So, what he is doing is definitely scientific. The only thing to be on the lookout for is bias. You could for example have a system that has a value of "10" for strikeouts instead of "5". The rest of the system looks great. And when you see the results, most of the results will look great (because most players won't have an abnormally high K), except you notice that high K guys appear alot. It COULD have been legitimate, but if you look at it in other ways, you would conclude that the weight for the K was too high, thereby leading to a bias.

That's the kind of thing that everyone needs to look out for: Bias.

11:02 AM Oct 9th

wovenstrap
The experience of the Cleveland Browns is recent evidence that your judgment on the Senators is the correct one. Technically, the Browns' history diverts to Baltimore around 1997 -- a couple years later, the Browns were given a new franchise with the same uniform, name, etc. as the old franchise. I'm a resident of Cleveland, I think it's safe to say that the fans were easily able to repress the fact that they had what can only be called an expansion team on their hands, and knit together the old Browns and the new Browns into one conceptual entity. It's not logical, but it's totally natural -- for the fans, the old Browns were back, that was all that counted. And it's not really wrong, either. You could argue that the denial of the fact that they have an expansion team has led to a mismatch of fan expectation -- Browns fans are loyal but also very hard on the team, fatalistic, it's possible that Browns' fans would be less bitter if the team were called "the Ohio Jazz" with teal uniforms. They would not be able to repress the fact that their team is brand new, and a series of 4-12 seasons would be considered about the norm. Instead it triggers disgust.
10:48 AM Oct 9th

DanaKing
Thanks for this. Fascinating reading for someone who wonders what things were like in other eras.

I have one question, about scoring environment stability: would it make any difference if, instead of using the average runs per team per game, the number used for comparison was the median runs per game per team? That's a somewhat more stable number itself, or does the large number of data points make that come out to be pretty much of a wash?
10:48 AM Oct 9th

3for3
How about player stability on a team? How many AB's/IPs were taken by players on the same team as last year? This has to have shown massive decreases over time. But has it stabilized? Or is it sill decreasing in the 21st century?
8:59 AM Oct 9th

bobfiore
Did you give any thought to what effect the appearance of a replacement expansion franchise does to the emotional investment in the lost team? You would have direct experience of this with the Athletics and the Royals. In some cases the new team must wipe the old team off the slate. On the other hand, the Mets still operate somewhat in the shadow of the Dodgers and Giants of old.
12:58 AM Oct 9th

articmike
I'm 100% in agreement with your preamble about functional definitions and statistical parameters, the creation of them, being in no way inconsistent with pursuit of scientific progress. I have heard that in some fields of study, such as political science, approaches similar to the ones you used in your examples are generally accepted by practitioners in the field. I do not know whether the types of studies in political science that have been described to me actually count as "models" in the sense of the economic studies referred to on this page, but the consumers of such studies generally accept them as being a part of the process of scientific inquiry, whether they qualify as models or not.
12:17 AM Oct 9th

articmike
If there actually were a calculation that showed that an average MLB career lasted 4 years (I'm not saying there is one; I haven't done the math myself; shoot me), would that have anything to do with a 75% player stability rate (a 25% turnover rate)? Would the two have nothing to do with one another, or something to do with one another but not closely related?
12:11 AM Oct 9th

articmike
Mr. James,

Do you see an overlap between these your most recent published studies and your suggestions of How To Break Baseball History Into Eras, about which there is a June, 2012 spreadsheet here on the site?

Besides an overlap, did doing these studies change any of your opinions, or confirm your previous estimates, of the "point values" of big change moments in baseball history, as reported via that older spreadsheet and article?

Does it sound interesting to you to break up (or break down) the change events enumerated in the old spreadsheet into those covered by your 5 stability indices and those that would belong in the missing 6th (and 7th, 8th, 9th & 10th) index if those indices ever get quantified? By this, I'm getting at a concept where the old eras calculation would be recalculated with new point values for those change events to which you've calculated percentage changes in these new studies, and the same point values for the other change events until somebody creates a series of values for the 6th & later missing indices.

I looked at the 2012 spreadsheet weeks ago; so before I read this; and I recall it going in one direction -- increasing numbers of points as time went on, as (things such as) offense levels rose, fell, rose, fell, etc. Don't you think your current approach here, where everything swings between 0 and 100, and where the old (such as a pitcher's era) can become new again, does a better job of measuring whether baseball fans perceive themselves, at a given point in time, as being in a brand new era, or a familiar era?
12:07 AM Oct 9th

MarisFan61
Bill: At least regarding what I've said about some of your statistical parameters for terms that are in general use, my criticism hasn't been anything about whether it's an appropriate scientific method, but about whether either the parameters or your eventual answers are good matches with what is generally meant by the term. I thought the "Bench Player" thing didn't very well match the usual concept (notwithstanding your thinking that the criticisms were nuts) :-) .....and a few years ago, when you studied "Consistency" of starting pitchers, I thought you were studying something other than what people generally mean when they talk about starters being "consistent" or (more commonly) "inconsistent," which was what had seemed to be the thing that inspired the study.
10:59 PM Oct 8th

jemanji
Very info-taining. Thanks!

The first guy who draws a triangle around a problem/mystery/riddle is going to get kibitzers, even resentment.

Only thing I'd add: Tightening that triangle is part of the natural evolution of human knowledge. Rather than resent the proposed framework, we need to get about the business of using the jumping-off point to our advantage. I don't know how many complaints I heard about the first Similarity Scores and similar "points system" triangulations, but … the guy who takes the first cut at it deserves our thanks.

For my money, that's one of your great gifts. Not just the first- triangulation of a problem, but also how tightly you're able to draw these triangles at the first attempt.

……..

As to to the content in the essay, as opposed to the way of thinking about the problem … :- ) there's an amusing irony here. Baseball often becomes unstable BECAUSE it likes to think of its rules as sacrosanct.

If it would be more flexible about changing its *rules,* the game itself would be more malleable and resilient. It would retain more of the "old friend" flavor that is so important to its corporate brand.

Appreciate the analysis,
Jeff
8:58 PM Oct 8th

flyingfish
Hi, Bill. Interesting article. I think what you describe in the preamble to the article, making definitions and then using them to extract knowledge, is akin to what scientists call making models. Maybe it's the same. The crucial thing about scientists' models is their ability to be predictive. Can you, based on your model, predict the outcome of a particular set of circumstances? Can you, as an evolutionary biologist using Darwin's model of evolution by natural selection, make predictions about the natural world? (Yes, and Darwin did.) Can you, as a meteorologist, predict tomorrow's weather based on your model? Are some models better than others?

It seems to me that you are doing what I've described above. I'd be interested to know if your stability models allow predictions of things, either in the future, which we'll have to wait to test, or of historical relationships that we haven't understood before.
8:26 PM Oct 8th

Stability

COMMENTS (29 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: