“If I had four hours to chop down a tree, I’d spend two hours sharpening my axe.”
Rather than following Lincoln’s wise advice, many teams feel they don’t have the time to sharpen their axes because they simply have too many trees to cut. And, the tasks are so urgent, there’s no time for a proper lunch, let alone time for improvements. So, the teams keep working with sub-optimal processes, and, the backlog grows and grows and grows. The axe-sharpening paradox raises the question: how quickly does an improvement effort pay off?
Many teams believe that if they just had a few extra people, they could easily get everything back on track. Maybe, but the costs––if paid in every team—could bankrupt the company. Furthermore, throwing resources at a problem is a lazy shortcut that doesn’t actually cure the disease. A better approach is to improve our ways of working to become faster, more efficient and reduce the process variability. And, all this is possible without more resources. The best example remains Toyota, whose very lack of resources in the 1940s and 1950s forced it to invent what we now call lean manufacturing. Its kaizen approach placed a premium on improvement, thus rigorously refusing to cut trees with blunt axes.
This problem made me curious. How quickly does kaizen pay off? In other words, if we invest time in improving, how fast will we reduce the backlog? To test this question, I created a Monte Carlo simulation for the handling of IT standard requests./1/ Lean approachs usually focus on eliminating waste, which is paramount. This simulation, however, more narrowly models the speed of the pay-off, thus allowing us to test questions like how much time should I dedicate to improvements.
Simulation Set-upIn excel, I set up a scenario where a four person team faces an average of 220 requests a week, and its productive time available, 25 hours for each member, matches or is slightly less than the average time needed. The requests come from 50 different types with varying base average effort and frequencies. We then generate the number of occurrences of each type and its duration randomly, using bell curve assumptions (see “Methods” below for the details).
The simulation depicts the course of 50 weeks. Despite the fact that the productive capacity is set to match the average effort required each week, a request backlog grows 80-90% of the time. This result is expected due to the random nature of the effort needed and the arrival frequency: Some requests take longer than average, thus creating a backlog. Once the team gets behind, it is hard for them to catch up. By contrast, when the team has no backlog, it cannot bank the surplus time available.
From this basis, the team engages in a kaizen initiative whereby it standardizes and improves each of the request types. The kaizen time invested is duducted from the productive capacity, thus mimicking the axe-sharpening situation, where a team feels like it cannot afford the improvement time. The simulation lets us vary the number of requests, how much time is invested and in what order they are improved. With these variables, we can test how investing in improvements affects the backlog. In short, how quickly does kaizen pay off? Furthermore, we can ask questions like: given a certain expected reduction in average effort, is there an optimum investment of improvement effort? Or, how large must the improvement be to make the investment worthwhile?
ResultsThe results are only preliminary, but they do show that we can answer such questions. Since I couldn't bear confronting you with such a long article that didn't have any pictures, I include a few here as examples. The figures compare the base situation (without improvements) with an investment of 60 minutes achieving a 10% improvement and with an investment of 180 minutes achieving a 20% improvement.
|Figure 1: Average size of backlog at the end of 50 weeks|
First, we note that the average backlog (figure 1)—moves from nearly 566 in the base condition to just
Figure 2: Histogram of 200 runs of the backlog at the
end of 50 weeks
Figure 3: Average Effort and Average Effort Backlog/Surplus
variability was improved to the same degree. These results are confirmed in the histogram in figure 4.
|Figure 4: Histogram of effort surplus / backlog.|
As mentioned above, these are the first results from the tool. I shall continue to play with it and tweek it, and then present further results in a follow up article.
Appendix on Method
TeamA team of four spends twenty-five hours per week each on requests; thus giving them a total productive capacity of 100 hours. For the sake of the set up, I wanted the team capacity to match the average time needed for the requests, since this is a common, albeit faulty, resource planning method. We also assumed that all of the productive capacity actually goes into request processing, i.e. that there is no time lost between each request. The lost time is in the other 15 hours per week.
We are assuming (admittedly unrealistically) that there are no hand-offs and therefore no delays due to waiting./2/ And, we are assuming that all team members have all the skills necessary to complete the tasks.
RequestsThere are 50 different types of IT requests, with the average effort for each type ranging from ten minutes to four hours. To set this up, I used a beta distribution, which means that there are more short request types than long ones. In this case, 60% of them require less than 45 minutes. Furthermore, each request type was assigned a frequency factor inversely proportional to its effort. Just as password resets are more common than setting up a complex database system, we assumed that requests taking 10-15 minutes occurred about 20 times more frequently than ones taking three to four hours. While the average effort and frequency factor per request type was fixed, in running the simulation the actual effort and frequency of each type varied randomly each week (see below).
Based on the duration, the frequency factor and the set-up of the random generator, the simulation produces between 170 to 280 request instances per week, with an average of about 220. In total, depending on the parameters chosen, the total effort per weeks nealy matches the 6000 minutes of capacity.
SimulationThe simulation depicts 50 weeks. Thus, it contains 50 rows showing each of the request types and 50 columns for weeks 1 to 50. In each week, the random generator produces a number of occurrences of each request type based on the frequency factor. This way a short request might occur 7 times one week and 12 in another, whereas a long request might occur just once or not at all.
The time needed for each request was also randomly chosen, but in this case using the assumption of a normal distribution (bell curve) based on its average and a factor for the variability. /3/
The simulation then runs 200 times, creating 500,000 instances of request count multiplied by duration. (Amazingly enough, excel handles this with ease on a laptop. A change in parameters needs just 1-2 seconds to process.)
ImprovementsFor the improvements, we can choose the amount of time invested in the improvement, the number of improvements per week and the estimated reduction (in percent) in time needed. Furthermore, we can adjust the reduction in the variability. This is important because immature processes have high variability, and one aim of standardization is to reduce this variability. So, if a request type previously averaged 45 minutes, with a range of 30 to 90 minutes, the improvement might reduce the average to 35 minutes, but also the variability falls to a much smaller range of 28 to 45 minutes.
Finally, we can choose the order in which the improvements are implemented. Either they can simply be done in order from 1 to 50, or they can be done from longest to shortest or from shortest to longest.
Notes/1/ Douglas W. Hubbard, How to Measure Anything: Finding the Value of Intangibles in Business, Third edition (Hoboken, New Jersey: John Wiley & Sons, Inc, 2014), provides the best introduction for creating simple Monte Carlo simulations that nevertheless work wonders in reducing the uncertainty surrounding a decision.
/2/ Waiting is the largest source of process inefficiency in an IT context. See, most recently, Frank Verbruggen and Jeff Sutherland, “Process Efficiency – Adapting Flow to the Agile Improvement Effort,” n.d., 7. I have already written about this problem before in these pages. It’s a huge issue, but the present article asks a different question.
/3/ Douglas Hubbard shows the rudiments for constructing these kinds of models. Ideally, here we would use a beta distribution. Beta distributions are generally seen to match the nature of human work, as there are real physical limits preventing us from going faster, but there is no limit to the amount of time we can spend on something if problems occur. But, beta distributions a are much more complicated to set up and use much more computing power than a normal distribution. The formula in excel is the =NORM.INF(), which uses a probability (here generated randomly), the mean and a standard deviation computed from the variability factor. To simulate the beta distribution, I truncated the processing time at the lower end. Perhaps in future versions, I can improve this.