At the beginning of the season, people tend to pay attention to statistics extra carefully. People make sweeping general statements ("Hosmer should be sent down to Omaha because he is batting .183!"). However, people need to understand the concept of sample size within the context of baseball.

Unlike any other sport, baseball is a game of averages. Over a 162 game season, players' true talent levels are generally well-represented by statistics. However, 15 games into a season, statistics matter very little. To illustrate this concept, I have done some statistical legwork. According to Craig Brown at Royals Authority (who, along with Rany Jazayerli, is the inspiration for writing this blog), it takes about 650 plate appearances to achieve a statistically significant sample size. Here is some work detailing what 650 PA gets us. All the statistics I use in this analysis are from 2011 and include only those players with enough PA to qualify for the batting title.

Here I use the statistic wRC, weighted Runs Created, which is essentially a measure of how many runs a player provides to his team offensively. I used a sample size formula provided by my training in Epidemiology, which is all about sample size. Let us do a little thought experiment. Consider two players, both with 650 plate appearances. My little statistical formula says that we will be able to detect a 3.43 difference in wRC between the two players. 3.43 wRC is about 4% of the average wRC. According to FanGraphs, over a full season a 120 wRC is "excellent", a 100 wRC is "great", an 80 wRC is "above average", a 60 wRC is "average", a 55 wRC is "below average", a 50 wRC is "poor", and a 40 wRC is "awful". So, 650 PA affords us the ability to tell between an average, a below average, and a poor player relatively easily. Fewer PA (a lower sample size) will increase the wRC difference we can detect between two players. So, fewer than 650 PA means we might have trouble detecting a difference between an average player and a below average player.

In contrast, 60 PA affords us the ability to detect a difference of 11.3 in wRC. A player could be considered average, below average, or poor. One player could have a wRC of 60 and be considered average. Another player could have a wRC of 48.7 and be considered poor. At 60 PA, those two wRC numbers are not statistically different from each other. They are essentially the same player.

I wanted to do this analysis with percentage-based statistics like wOBA (weighted on base average) and OBP (on base percentage), but the formulas for calculating sample sizes for proportions (which these are) are quite tricky. wRC is a sufficient offensive statistic for the purposes of this analysis.

So what did we learn from this analysis? We learned that 60 plate appearances is woefully insufficient to gauge the worth of a player. We should now realize that even though Hosmer has not performed as well as he has in the past, 60 PA should not be enough to make us worry. Hosmer is hitting the ball well and has had terrible luck; he will right the ship eventually. 650 PA is a good number before we can start making judgments about the offensive worth of a player. For reference, 650 PA is roughly 162 games' worth of data on an offensive player. This works out well, yes? Should anyone question these methods or want more information, leave a comment and I will see if I can provide an answer.

Kevin - Thanks for the kind words. You hit the nail on the head as to why I threw out 650 plate appearances... It's a full season of work. We can watch Yuni Betancourt for two weeks and think he's a good choice to be a leadoff hitter, but the full picture tells us otherwise. And the fact we have several seasons of data of around 650 plate appearances, would confirm our assumptions.

ReplyDeleteGood stuff. Glad I could help. :)