|Home About Kurtosis Courses Course Calendar Booking Information Ideas Technique Clients Contact|
Simon Dodds: informing improvement
An email conversation between Neil Pettinger and Simon Dodds
I met Simon Dodds over a curry on a weekday evening in Newcastle-under-Lyme in the middle of November last year. Three hours passed in an instant, two middle-aged blokes discussing the NHS, data, improvement science and a host of other things. Three hours wasn't long enough; there was clearly unfinished business. So our discussion resumed—by email—earlier this month.
Simon Dodds describes himself on his LinkedIn profile as a Healthcare System Architect. But he is also a consultant surgeon at the Heart of England NHS Foundation Trust and the founder of SAASoft Ltd.
The following conversation took place over a series of emails in January 2014. It began with Simon commenting on some terminological sloppiness in a visualization piece I wrote earlier this month.
In my recent blog Rows of Dots: visualizing the asymmetry between hospital arrivals and departures, I made the point that—in the first half of the day, up until about 2pm, in fact—Acute Medical Units (AMUs) tend to admit more patients than they discharge. And I said that this means that "if there weren't enough empty beds at the start of the day, it will be a big struggle to match demand and capacity as the day develops, and the likely result will be long waits for the patients in the Emergency Department."
Yes, you are right about the data. There do tend to be more admissions than discharges in the morning. But I have a problem with the other part of your statement. I want to know what your implied definitions of “demand” and “capacity” are? In particular, what units of measurement are you using? My reason for asking is that unless you are crystal clear on this then you may unintentionally be spreading more confusion.
Here’s an example of what I mean. If we define demand as an event count metric such as “number of patients who arrived between 09:00 and 17:00” then the units are "patients per unit time". So if you are going compare this flavour of demand with “capacity” the units of capacity must be the same—or your comparison is meaningless and confusion will result.
Later in the piece you imply that capacity is "number of beds"—i.e. a count of the number of patients who could be stored at any point in time—which has the units of "patients". This is not the same as the previous definition of "demand" so the comparison is meaningless, and if unconsciously assumed to be the same (our intuitions are very sloppy in this respect) then we might intuitively jump to the seemingly obvious solution of "more demand requires more beds." This is what people do all the time and it is an invalid conclusion. And I think the demand and capacity confusion has been sustained rather than defused by loose use of terms like "demand" and "capacity".
OK, I admit that I was being sloppy in my use of words there. Yes, I, too, have always had a sense of unease about the way the NHS measures inpatient demand as patients per hour or day or whatever and capacity as just “beds”. But my main interest is about helping NHS staff to use data to visualize complex reality. As a result, my energies are more focussed on trying to create graphics that have resonance for the managers and clinicians trying to understand the system they inhabit.
Since you also raised the point about visualization, let me show you how I would go about visualizing the same data.
First, here’s a Gantt chart (on the right) that shows the work in progress (WIP) for the AMU on the day in question. For the time window of 8th January, the WIP is given by counting the number of horizontal lines of green squares that are crossed by each time-interval column.
As well as a Gantt chart, I’d also draw a WIP chart (below). One that effectively summarises the bottom-line totals on the green Gantt chart:
This WIP chart has the classic ‘Mount Kilimanjaro’ appearance of a demand-activity temporal mismatch: filling in the morning and emptying in the evening. The ‘WIP wobble’ (as Kate SIlvester and I call it) determines the extra space capacity (i.e. beds) needed to buffer the WIP variation.
We see exactly the same pattern in outpatients—the WIP being the patients in the waiting room.
The thing I like about your second chart is that it's a run chart (I assume that you've calculated the median in order to draw the centre line), and I have to admit that it had never occurred to me to show work in progress as a run chart. It’s interesting that as soon as you do that to data with a 'Mount Kilimanjaro' pattern, you immediately see that there is special cause variation to deal with.
But hang on a minute, isn’t your 'Mount Kilimanjaro' run chart just a different way of visualizing my stepped bed occupancy chart? I’m talking about the second chart in my blog, the one that shows the number of beds occupied in AMU at different times of day:
Except that I’d argue that my chart has the advantage of showing time as it was “really experienced” (e.g. the long period of time between 3:00am and 9:00am when the number of beds occupied didn’t change) whereas your chart just logs the patients in sequence without taking this into account.
There is certainly variation over time on my WIP chart but you cannot say it is "special cause" because you will notice that my WIP chart is not plotted with control limits. The reason for this is because the assumptions that underpin the design of a Shewhart-style control chart are invalid when plotting work in progress. This means that the standard calculation for the control limits is incorrect so any ‘special causes’ flagged are myths. I can give you the technical chapter-and-verse if necessary. You are quite correct that your stepped occupancy chart is a form of WIP run chart but WIP charts are more generic. They will show the number of tasks in a process at any point in time even if the tasks are not taking up physical space. Patients sitting on a waiting list, for example.
Your version of the chart is plotted a different way—using the actual patient events—and mine is plotted at equal time intervals, like a stock count. Either way is OK and if the time interval on my chart is made smaller they will look identical.
A WIP chart can sometimes show no change. But there's a danger that we can be misled into thinking that there are no flow problems. If patients are admitted and discharged at the same time, for example, then the chart will show a flat line. This will happen in a ‘gridlocked’ system when it is one-out-one-in. The WIP chart must always be interpreted in the context of the other charts. If WIP is rising or falling then the Demand and Activity run charts are the next to look at. If WIP is steady then the Lead Time chart is the one to look at. I call these the Vital Signs of any process. There is a fifth one—the Yield chart—but that is a side-branch that we will not explore here.
Coming back to the original question: demand and activity are time or stream metrics so would be measured in units of patients per unit time. So the ‘capacity’ that is relevant to them is ‘flow-capacity’. WIP is a stage or space metric so is measured in units of ‘patients’ and the flavour of ‘capacity’ that is relevant is measured in different units: patients.
There is a relationship that links them though. It is called Little’s Law.
Before we get onto Little’s Law, can I just challenge you on your point about run charts and special cause variation? You implied that only charts with control limits are able to show special cause variation. But I have been taught otherwise. When a run chart “breaks the probability rules” (as I think the WIP chart here does: only three runs in a dataset with what looks like 18 useful observations), then that is evidence of special cause variation, is it not? Or do you give it a different name?
Anyway, that’s a bit of a digression (although I’d obviously be very interested in getting chapter-and-verse on why control charts are an inappropriate tool to describe work in progress).
I also take your point about needing to interpret WIP charts in the context of other charts. The one here, for example is an acute hospital AMU. But if you look downstream (about 60% of the discharges from the AMU in this hospital are transferred into a downstream specialty bed), then the WIP charts for the downstream wards would likely look very flat for the very reason that you described: one patient out; another patient (who’s probably been queuing in the AMU) immediately in to fill the just-vacated space.
But what I really want to know is how you arrive at a measure of capacity that matches a measure of demand that is defined as “number of patients per hour”. This is what you seem to be referring to as “flow-capacity”. Please tell me more. I’m guessing that it’ll involve an explanation of what Little’s Law is…
I will address all the questions you raise because it is possible that others have the same questions.
Firstly, the issue of WIP charts and ‘special cause’. The conventional XmR chart is not designed for WIP data because the statistical assumption that underpins the calculation of sigma is that adjacent points are independent samples—lots of little two-sample t tests, in fact. This assumption is required for the calculation of sigma to be valid and therefore the application of the Western Electric run tests based on sigma to be valid. A WIP chart is plotting the cumulative sum of the difference between demand (inflow) and activity (outflow). It is a form of CUSUM chart. This means that consecutive points are not statistically independent so the calculation of sigma is incorrect: it is too low. It is possible to correct for this effect but that is outside the scope of this conversation. The message is: do not plot WIP data on an XmR chart. I feel it is reasonable to use a run chart because the median helps to get a sense of the ratio of the wobble and non-wobble, but I doubt if the standard run tests are valid either for the same reason as above. That would be a topic for an academic statistician to explore perhaps.
You are quite right that when looking at a stream—which comprises a sequence of steps (i.e. where the stream crosses a stage)—then the WIP charts for each step provide very useful diagnostic information that help to interpret the upstream and the downstream charts. What we are looking for in this diagnostic phase is evidence of design flaws, and the patterns are very characteristic when you know what to look for. It is rather like diagnosing a patient disease from the pattern of symptoms, signs and test results. The diagnosis gives us a good starter for an effective treatment option.
So, to your final question: “How do we get a measure of flow-capacity?” By that I will take it to mean the maximum flow that the stream can sustain over a period of time. The potential activity. That is not the same as the measured activity which will always be less when measured over a long enough period that the short-term variation can be averaged out. Design flaws cause the actual activity to be less than the potential activity. And when actual demand exceeds actual activity, that causes WIP to increase, and that in turn causes the system to run out of ‘space-capacity’ and then the lack of space becomes the flow-limiting step. Crunch! Then we get the “rigid” one-in-one-out behaviour. We can even get a phenomenon called ‘dead-locking’ where flow drops to zero. The simplest solution is to just add more space-capacity - and that works in the short term but it is treating the symptom and not the cause - the design flaws. It is also very expensive if the extra space means more beds, more staff, more everything. The better approach is to diagnose and treat the design flaws.
Little’s Law describes the relationship between flow, lead time and work in progress. And by lead time I mean the time interval between two events: a patient arriving and the same patient leaving. Some books refer to this as the cycle time which is incorrect and confusing. Lead time is a stream metric; cycle time is a stage metric. I do not want to get side-tracked with a long debate on that because it is fully described in the FISH online course.
Little’s Law is a law of flow physics. The mathematical proof is actually quite recent (1950s), and takes its name from the author of the seminal paper. What it states is that:
This is a rather surprising statement because it does not assume a first-in-first-out (FIFO) queue. Little’s Law works even if tasks get out of order which is very handy in healthcare because that happens most of the time.
So Little’s Law gives us a convenient way to estimate the average space-capacity we need based on the measured average lead time and average flow.
It is important to note, however, that we are using averages here and that has two important implications: first, that the system must have a stable average to apply Little’s Law and, secondly, Little’s Law does not help us calculate the maximum space capacity, the extra space capacity above the average needed to accommodate the WIP-wobble. We need some slightly more sophisticated tools to do that.
So your question about how to design a system so that it has sufficient flow capacity at every step of the stream to keep up with demand and not require a big patient warehouse or sequence of patient warehouses is the $64,000 question. The answer is to learn the theory, techniques and tools of flow-improvement-by-design which is a core component of Improvement Science. The first step on that learning path is FISH (the Foundations of Improvement Science in Healthcare), which is where we cover all the above and more with lots of realistic worked examples. At the start of the FISH course there is a video of a simulated A&E department that goes ‘crunch’ even though it was designed with enough capacity and then some’. The reason the design fails is because the method used to design the space and resource capacity was the “Flaw of Averages and a Fudge Factor” which is neither reliable nor safe. And we all know that from our experience of the many failed improvement initiatives that have followed as a result of that approach!
OK, I think I at least got the gist of your control chart points! But I actually want to steer the conversation back in the Little’s Law direction..
You say that Little’s Law describes the relationship between (a) flow, (b) lead time and (c) work in progress. In the context of our AMU example, I’d be tempted to translate these three terms into (a) average number of admissions per day, (b) average length of stay and (c) average number of beds occupied.
If my translation is correct, then it turns out that I frequently invoke Little’s Law when I tell hospital managers that there is “simple arithmetic” that prevents you from trying to put a quart into a pint pot.
So let's go with your definition:
If we take a year’s data for this AMU, for example, we find that it is admitting—on average—21.2 patients per day, that the average length of stay of these patients is 0.9 days, and that the average number of beds occupied on each day 19.08.
If we put these numbers into your formula, we get:
…which—thankfully!—seems to work.
Also: because this is a 24-bed AMU, we can also deduce that the average percentage bed occupancy is:
Now, let’s assume that my interpretation of Little’s Law is correct there. The trouble is, as you say, it doesn’t help us determine what the “right” length of stay is, or what the “right” number of beds is for any given number of admissions per day.
But the way I tend to think is that—given that in most NHS scenarios, the number of beds is fixed, and neither can we do much (in the short term at least) about the number of admissions coming in each day—we therefore need to focus our attention on length of stay.
But we do need to know what the right level of bed occupancy is in order to cope with the variation in demand. On a busy day in this AMU, for example, there can be 40 admissions; on a quiet day, as few as 10. And in order to find out what the “right” level of occupancy is, I often adopt the "pragmatic" method of saying: “Let’s look at the days when you didn’t have any problems. What was the bed occupancy like on those days – and maybe that’s what your ideal bed occupancy should be.”
So in this AMU, for example, we might look at a year’s worth of days, then pick the days when there appeared to be no delays in accommodating admissions (which we could measure very crudely by looking at the days when there were no breaches of the four-hour target), and we could see what the average bed occupancy was on those days. Suppose it turns out to be 75%, then we could go back to the Little’s Law equation and change the 19.08 to 18 (that’s what 75% of 24 is), and see what the average length of stay needs to be in order to keep the system stable at an average of 75%:
Average length of stay needs to drop from 0.9 days to 0.85 days in order to maintain flow. How does that sound?
That sounds OK to me. You are using Little’s Law exactly as I would: to give a sense of where the average needs to be to have any hope of expecting non-chaotic behaviour. As a pragmatic heuristic—a rule of thumb—to get us started. It is also very useful for sanity-checking a more complex design exercise. What you are doing here is saying ‘Of the three interdependent variables in Little’s Law, which fall into my circle of control? Average demand (i.e. flow) probably does not in the urgent care context. Number of beds available to be occupied (i.e. space-capacity) is hard to change in the short term and anyway warehouses full of sick patients are both dangerous and expensive. So that leaves lead-time. Now this is where it gets interesting because lead-time is not an input in a system design exercise; it is an output. We can only influence lead-time by tweaking our system design. And that means the design of our system policies. The system software so to speak. This is one reason why setting lead time performance target is rather ineffective - it is like checking-then-scraping burned toast. If the design of the toaster is such that it burns toast then having a sophisticated toast scraper is just perpetual fire-fighting - quite literally. What we need is the how-to-design-a-toaster-that-makes-OK-toast-100%-of-the-time skill. Not perfect toast note - OK toast. And that is what FISH and ISP are designed to develop: Improvement-by-Design capability.
And just to add some more chilli to the sauce - it is not demand variation that is the important factor - it is load variation - which is not the same thing. Not the same thing at all. It has different units. At the start of the FISH course is a short video of a Bird’s Eye View of a simulated A&E Department. The model is deliberately simplified to illustrate a characteristic pattern of system behaviours called a ‘predictable catastrophe’. The cause is poor policy design.
Well, you’ve definitely got me feeling guilty about the fact that I haven’t yet embarked properly on FISH! But can I just see if we can develop the discussion around what you just said about lead-time:
The problem I have with that is that in a lot of NHS scenarios what we are going to be doing is trying to safely reduce length of stay. Any ideas that people have that might accomplish this are worth testing and evaluating. Then you can try and model the impact. If doing x instead of y means that x% instead of y% of patients can be discharged two hours earlier, then that would bring mean length of stay down to z and then we’d get flow back again. That kind of thing.
So although length of stay might not be a target in the “four-hour target” sense of the word, we’d surely need to have a figure that we’d be aspiring to achieve, would we not? But you seem to be suggesting that this having-it-as-a-target is somehow a “bad thing”. I was puzzled by that.
The problem here is that the word ‘target’ is being used in two slightly different ways and we could use the word ‘standard’ and still make the same error. In process design we have a ‘design specification’ and it is always expressed as a range, not as a single number. This is because variation is always present and fudging the issue using averages is a sin called the 'Flaw of Averages' Error. The single-value arbitrary lead time performance targets that we see used—such as four hours in A&E and 18 weeks in planned care—are perhaps well intended, but they are a poor design, especially if ‘failing’ leads to punitive action. The lead time is a measured stream behaviour: a consequence of the design. It is reasonable to have a lead time specification expressed as a range and to plot that range on the time series chart to assess the design ‘capability’ as it is called. What we want is a design that is capable of meeting the specification, a design that can be trusted, one that is fit-for-purpose. Without the need for sticks and carrots. So we could start with a 0-4 hours lead time specification for A&E then design the process (the policies, in actual fact) so the A&E system is capable of meeting the specification. We do not appear to do that in the NHS because we have never been trained how to, And the very fact that we are having this conversation is evidence of that assertion. I was taught the principles of flow science over 30 years ago—before I started my clinical training—in the guise of Operating System design. The flow of data through computers is an essentially identical design challenge.
I am a bit reluctant to open another can of worms as we have already skated rather quickly over a lot of important foundation material. So I will just bait the FISH hook once again. Load is what we must work out in order to estimate the resource-capacity we need. It is resource-capacity that drives the cost model. So to create a system design that is financially viable we need to know how to use both load and load variation. Unfortunately, healthcare finance does not appear to know how to do that so, as I said, this is a can of worms. Load is also what we need to understand when designing schedules and that has immediate and practical applications. The A&E catastrophe problem I pose at the start of FISH requires an understanding of load in order to solve it; and the example I share at the end of FISH (called Dr Grumpy’s Clinic) illustrates the how-to-do-it for a rather common and persistent niggle: patient waiting in outpatients. In my experience most outpatient waiting can be eliminated overnight and at zero cost. Now I feel that is worth a bit of investment in learning how to do it - don’t you?
|© Kurtosis 2014. All Rights Reserved.|
Comments on this article
28 January 2014:
FISH is brilliant and has helped me enormously in being able to articulate the work I have been doing for many a year. I would have loved to have been in on the conversation - I use Simon's Baseline Software all the time.
Surgical Pathway Redesign, Department of Health, Victoria, Australia
28 January 2014:
Brilliant, Guys! I've passed it on to the underground movement already!
Managing Director, Kate Silvester Ltd
29 January 2014:
Great discussion, careful on using definitions and then using these to propose actions. Would be useful to have it distilled. There is more here to add to the argument for abolishing targets, number one being 4 hours. It's a bit like designing a company to have a profit of precisely £1,000/month.
But it's handy to say what to do differently. Understanding demand, and the nature of demand which you call load, is essential and I suggest this should be the start of the discussion. In primary care, taking this approach has enabled us to drop the lead time to speak to a GP below 30 minutes. Yet it is not designed around lead time.
Chief Executive, Patient Access Ltd
30 January 2014:
That is a great conversation chain. Should be prescribed reading!
Assistant Director (Service Improvement), Sheffield Teaching Hospitals NHS Foundation Trust