Tuesday, October 25, 2016

Average vs. median

Most people are familiar with the concept of average (or, in more technical terms, arithmetic mean) when talking about numbers. The average of a group of numbers is their sum divided by their amount. This concept crops up all the time in all kinds of things, like average salaries, average prices, average viewership... Statistics just love averaging things.

As an example, the average of the numbers 1, 3, 4, 10 and 25 is 8.6.

A lesser-known related concept is the median. Many people don't even know what it means, and others might have heard or read the name but not really know it either. When they hear what it is, it might seem like such an arbitrary, even useless, thing.

The median of a group of numbers is simply the middle one, when the numbers are sorted in increasing order. (If there's an even amount of numbers, then it's the average of the two middle numbers.)

In the above example, the median of 1, 3, 4, 10 and 25 is 4, as that's the third number in the ordered list of five numbers.

But what use is the median for anything? As said, it may feel like such an arbitrary and even useless thing to calculate. However, there are many situations where the median is actually more useful and informative than the average.

The good thing about the median is that it kind of automatically discards extreme outliers from the equation.

For example, suppose that there's a market for a product, like an individual card from a trading card game. There may be hundreds and hundreds of sellers for that particular card. In order to get a picture of how valuable that particular card is (eg. compared to other cards from the game), you may want to know a number that reflects the overall pricing.

The average of all the prices might sound like a good idea at first, but its problem lies in what I mentioned earlier: Extreme outliers may distort the figure, making it less informative and useful.

For example, maybe 30 sellers are selling the card with prices ranging from 50 cents to 1 dollar. But there's one seller that, for whatever reason, is selling it for 1000 dollars. If there's eg. an automatic server-side program that collects all these selling prices and averages them, that one outlier would skew the result drastically, making it almost useless. It would make the card much more valuable than it really is.

The median of the prices, however, can be much more useful. In this example, the average may be something like 32 dollars (which would mean, if taken at face value, that this is a really expensive card), while the median could be something like 74 cents (which would mean this is a moderately priced card).

In this case the 74 cents is much closer to the truth than the 32 dollars. The latter number is heavily skewed by that one outlier. The median automatically discards such outliers, making the resulting number much more useful.

No comments:

Post a Comment