How to overlay percentiles on a histogram in Microsoft Excel
[By the way, decile is pronounced “dess-isle”, with “isle” as in “Isle of Skye”.]
When you’re looking at a hospital length of stay frequency distribution (the example below is for emergency medical admissions at a medium-sized district general hospital), it sometimes helps if people can see at a glance what percentage of patients were discharged in less than a day or in less than three days or in more than 15 days and so on.
And if you are just looking at a histogram on its own like this, it can be difficult to see these percentages without having to do some awkward mental arithmetic.
However, if you superimpose deciles onto the chart, then you can see at a glance how the distribution is spread out in percentage terms:
Now we can see that 30% of the patients had lengths of stay less than 24 hours, and that 10% of patients were in for longer than 17 days, and so on.
You could superimpose the deciles just by using Excel’s drawing tools, by pasting vertical lines over the top of the chart in the right places. But that would be a bit clunky, a bit "amateur-ish", so this article shows you how to do it “properly” using Excel’s Y-error bar functionality.
The data we'll work with is shown in the worksheet in Step One, which you'll find in this Excel workbook:
All you’ve got is a whole load of numbers (6,446 to be exact), each of which is an individual length of stay figure (measured in days) for the patients admitted as emergencies in 2011. Here are the first ten lengths of stay:
Step Two shows how we’ve generated a frequency table of these length of stay figures using the =FREQUENCY function.
Step Three shows that we’ve had to insert a column that will have the labels as we’d like to see them in the histogram. We’ve done this because the labels in the frequency array function are upper limits instead of lower limits. In the example above, the first figure of 1,781 represents all the patients who had a length of stay of up to one day. In reality (and to satisfy the demands of convention) it's usually better if we label such patients as having a zero-day length of stay.
Step Four shows the histogram that’s been drawn using the data in columns D and E of the Step Three worksheet.
Now for the deciles. In the worksheet in Step Five we’ve calculated the 10th, 20th, 30th, 40th , 50th, 60th, 70th, 80th and 90th deciles for the length of stay data in columns H and I.
We now have to superimpose these deciles onto the histogram that we created in Step Four.
First of all—and this is what we’ve done in Step Six—we’ve assigned a Y-value of 2,000 to each of the deciles that were calculated in Step Five. We’ve done this because we’re going to draw a scatterplot of these points in a minute, and we want each point to be positioned horizontally along the 2,000 line on the y-axis, but with an x-co-ordinate that corresponds to their decile position (that’s why we calculated the deciles in Step Five).
Next, we draw the scatterplot of these nine pairs of points and we format it so that it looks like the one in Step Seven:
Next we add Y-error bars to the nine points so that we get vertical lines dropping down to the x-axis. This is what you should be able to see in Step Eight:
If you’ve never copied and pasted one chart over another chart in Excel, you might be surprised by Step Nine because that’s what we’re going to do. In order to show it we’ve made a copy of the histogram that we drew back in Step Four, so that it now becomes Step Nine. And then I’ve copied the scatterplot chart from Step Eight and pasted it slap bang over the top of the histogram in Step Nine.
It will look a bit weird at first. But if we do a few more things to it you’ll see that it starts to look like we want it to turn out.
In Step Ten, for example, you can see that we’ve right-clicked on the histogram data series (which had been temporarily turned into a scatterplot) and we’ve clicked on Chart Type… and we’ve turned it back into a histogram (whilst leaving the deciles as a scatterplot).
But the biggest problem we need to fix is that the Y-error bars are dropping in the wrong places. The first decile (the 10th percentile) has kind of disappeared off the left of the graph. And the ninth decile (the 90th percentile) is hitting the x-axis at 17.4 days when it should be hitting it at 17.9 days.
To fix this we need to plot the scatterplot dots on a secondary x-axis which we will then re-scale so that it is in harmony with the existing primary x–axis.
Step Eleven shows the chart with a secondary y-axis (which—strictly speaking—you don’t actually need but the only way I know of getting a secondary x-axis is to put a secondary y-axis in place first then go into Chart Options… and then add in the secondary x-axis from the Axes tab.
But if you scale your secondary x-axis so that it has got exactly the same number of points on it as the primary x-axis then you should get your deciles so that are all nicely in the right place.
Then, with a bit of final formatting, you should end up with a chart that looks something like the one in Step Twelve:
[28 February 2012]
|© Kurtosis 2012. All Rights Reserved.|