The Other Code Smell – Rot – Richard Arneson's Thunkingspot

Code Rot conjures up images from sci-fi more than something to be seriously concerned about. Rot or decay implies a loss of integrity, a loss of capability or suitability for the intended purpose, succumbing to pathogens, all accompanied by a bad smell. Since all these things can be observed in software systems, if somewhat metaphorically, rot would seem to be an apt description. Software rot is driven by the changes we make (internal factors) and changes in our ecosystem (external factors). Even if we could hold all the internal factors constant, for a deployed, operational software system the external factors are constantly changing. The threat environment constantly advances, requiring software systems to respond. The law changes. Data communication paradigms change. Irreplaceable hardware fails. The ability to engage talent to maintain aging technology stacks becomes constrained. You will need to add capability to meet a new demand associated with your core business objectives, and, eventually, you will need to do so using technology that cannot easily coexist with systems that have not changed in years. To operate a software system is to have a tiger by the tail. You can never let go.

If your enterprise builds software as a strategic capability, you will have internal factors that drive software rot as well. These will be more insidious. Even relatively small software systems will manifest the cumulative impact of thousands upon thousands of engineering decisions, every one of which makes a small contribution to the growing health of the system, or a small contribution to the growing pathology of the system. The one thing the system will not do is remain static. We may not like change, but change likes us. If you can acquire the capabilities to consume change, for a cost your business can tolerate, you can make change a friend – a competitive advantage. If not, change will be an adversary, with a predictably disastrous end game. It’s just a matter of time.

Every change has initial and incremental costs attached to it. No one would think it was reasonable to ignore, indefinitely, the incremental costs to maintain a physical piece of capital equipment. We understand that deferred costs will eventually exceed the cost of replacement, but, at least, we have a pretty good understanding of replacement costs when it comes to capital equipment. It may even be a perfectly rational decision to defer maintenance. In the case of software systems, the impact of deferring incremental costs can be deceptively opaque. For internally built software the cost of replacement is even more opaque. This opacity invites false confidence – like a clear blue sky, rather than conveying a sense of risk – like a dark alley. We may understand the explicit costs of licensing and upgrades, but we have difficulty understanding the combinatorial expansion of cost drivers as software ages. You might object that of course we understand what software costs over time – accounting is on my case about it all the time. True, but we aren’t very good at understanding why it costs what it does. We understand what things cost to do as we have historically done them. But what we have historically done is the metaphorical equivalent of driving an automobile into the ground without ever changing the oil. We are less experienced at understanding the ongoing costs to operate a software system that is being kept healthy. We have remarkably few healthy software systems to base this on, and we don’t have good quantitative information about how our engineering practice drives the incremental cost of change (at least not widely available). However, we do have anecdotal information. It is not hard to see that very poor engineering practice will cause each increment of change to become more costly because it has to be done within the context of the last [poorly executed] increment of change, and so on. The cost for an increment of change rises over time. This is the equivalent of not making the interest payment on a debt. The interest rate on tech debt is driven by our engineering practices. It is also not hard to envision that truly excellent engineering practices could result in a falling cost, over time, for an increment of change. Every system will operate between these two poles. If we cannot at least keep the cost of change stable, we have the necessary and sufficient conditions for software rot. If you stay in the industry for any length of time you will have occasion to work with a system that seems to have ground to a halt because the cost of change has become too high. What is especially tragic is that there may be a tipping point where there is no path to a stable cost of change that can be paid for with revenues derived from the software system. Fortunately, if the system has not yet reached this point, we do have a good understanding of engineering practices that are “safe bets”. And if we care to, we can measure the cost of change over time, and build an organic understanding of what is working for us, and what is not.

Are there concrete examples of how this works out in practice? I suspect many of us could come up with a list of ventures that seemed to collapse under their own weight. But I think we can learn something more from the cost of cloud services. One of the attributes of a healthy software system would certainly be that it offers a predictable stream of value for a predictable stream of costs. It is remarkable that so few software stacks can truly be said to possess this attribute, but cloud services might be the closest readily visible example we have. I don’t think our understanding of how engineering practice affects cost quantitatively is sophisticated even here. But at least we have an aspirational assertion that cloud services are maintained at some level of capability, with some level of quality, for some predictable cost. Assume for the sake of argument that cloud services encapsulate our best understanding of good engineering practice, sold by the pound, to provide an increment of “good” for an increment of cost. One of the many interesting things we learn about cloud services is… they are expensive. Are they more expensive than an owned and operated system? The sticker shock we experience is certainly real, but honest comparisons are hard. Renting a little space in a trucking fleet is not comparable to buying and maintaining your own delivery truck. They do not have comparable capabilities. I know I can, for a time, neglect the maintenance on my truck. And if the truck is paid for, it can appear very inexpensive to operate – for a time. Cloud services don’t offer that kind of cost “flexibility”. It is reasonable to think they provide the best rendering we have of the increment of cost associated with an increment of capability maintained against a target level. It may only be a directional level of insight, but I think it suggests that the cost of predictable outcomes is rather startling. I don’t say this to suggest cloud services cost too much. I’m suggesting that what cloud services cost should perhaps tell us something about the realistic cost expectations for what we build on top of them if we don’t want it to rot.

A reasonable question would be – why don’t we seem to be more interested in understanding whether our practices assure a stable cost of change? Well…, someone certainly is interested. The people whose investment ends up a smoldering ruin would probably like to know more about how it could have gone on being a cash cow for longer. Perhaps incentives aren’t well aligned for this to be part of the conversation. How many prospective business ventures would be exposed as non-starters early on? Or, for that matter, what impact would this have on how a venture is valued for acquisition? A lot of people have been quite successful doing things the way they have been done for decades now. Perhaps lucrative opportunities have been thick enough on the ground that we haven’t really needed to manage the cost of change rigorously. Even if that is true, I see no reason to believe it will remain so. Our industry is changing, and I would suggest that understanding the cost of change more fully will be a significant competitive advantage.