Duration Mismatch In Code – Richard Arneson's Thunkingspot

Yes, this is an article about software engineering.

No, I am not talking about the Silicon Valley Bank failure… Or am I?

Duration mismatch is the difference in time to maturity between a company’s assets and liabilities. (Also note the adjacent concepts of duration risk and duration gap if you’re the sort who wants to dig deeper.)

We already use the financial analogy of tech debt in software. It seems only reasonable we might want to think about a concept closely related to debt – duration mismatch. As software engineers we have a veritable smorgasbord of ways to get things wrong. So much so that getting things right can seem like a hopelessly narrow way indeed (The Other Code Smell – Rot). And this is where I say, “It’s even worse than you think.” Let’s just imagine your team has its ducks in a row and all their feathers numbered. You are risk aware and adept at managing that risk. I invite you to consider a different kind of risk – something that is subtle and all around us.

This thing all things devours;
Birds, beasts, trees, flowers;
Gnaws iron, bites steel;
Grinds hard stones to meal;
Slays king, ruins town,
And beats mountain down.
        J.R.R. Tolkien, The Hobbit

The answer, of course, is time. Not the problem of too little time – volumes have been written about that. Rather, the problem of too much time. Specifically, two facts that obviously spell disaster when considered together can appear quite innocent when they are separated by time.

Consider an example. Over-constrained situations occur. It’s part of life (Constraints Are Your Job). And, because our feathers are numbered, we have ways of dealing with them. We reduce the scope and effort of the work, usually by accepting a loss of robustness, scalability, extensibility, and probably other “ilities”. Basically, we take a deep breath and borrow… from the future. Imagine you deploy software with a known bug (I know this never happens… but work with me). You did all the right things. The risk was evaluated and determined to be minimal because, for example, the bug only occurs when a new API is called with a customer that has more than one payment method.

“But we don’t have any customers with more than one payment method.”

“We don’t allow customers with more than one payment method.”

“We may never allow customers with more than one payment method.”

“It would be foolish to let customers have more than one payment method. And the greybeard says there is probably code all over the place that assumes there can only be one payment method.”

“I know we allowed the API to accept a list of payment methods because sales wants to allow more than one payment method in the future, but realistically this is never going to happen… anytime soon.”

“There are just too many other features that are more important and we have work arounds that allow much the same functionality.”

“Well, even if it gets called the request just fails – bad customer experience but we don’t save invalid data.”

So, the software was deployed. You even added a work item to the backlog to revisit this. Ok, now I’m just being mean.

Seriously though, pragmatically speaking, why am I making a big deal about this? Everyone involved is being reasonable and responsible. There really was no immediate risk that should justify delaying the feature deployment. And what does it have to do with banking? The answer is that this type of risk, or rather, our assessment and solution for this type of risk, assumes we know the future. And, when we wake up one day, and find the future isn’t what it used to be, we have no inexpensive way to identify all the broken assumptions. Let me be clear about the problem. In one sense, this is no different than failing to anticipate a future opportunity – and we would not necessarily beat ourselves up for that. The big risk is not that we someday allow customers to have multiple payment methods and the software breaks. Software breaks all the time. It was anticipated and dismissed, correctly, as low risk. But that was never the real risk. The real risk is that three years down the road we make commitments based on a false understanding of our assets. What will be remembered most is that we implemented the ability to handle multiple payment methods. We re-discover, too late, the capability was never fully hydrated. We also discover that the greybeard who made all those prophecies of doom, and is now retired, was right. In short, we discover that the cost of this feature is cripplingly high. Our institutional ability to hold all the pieces of information needed to assess this type of risk across a span of time is limited. We sign ourselves up (habitually) for liabilities to be paid for “soon” with assets whose value will only be realized “someday”. And that is what this has to do with banking.

You may recall the 2008 mortgage crisis (The Global Financial Crisis – you know it was big because it had a “The” in front of it). The risk accumulated in the financial system surrounding mortgage-backed securities was invisible unless you were wearing your secret risk decoder glasses. Then, quite suddenly, Lehman Brothers became the biggest financial institution bankruptcy in history, followed quickly by the biggest bank failure in history (Washington Mutual) and then by a parade of others. Underneath this was a raft (more like flotilla) of duration mismatch combined with too-clever-by-half financial products. Somehow the solutions never seem to deal with the duration mismatch. In fact, part of the solution was to pretend it wasn’t there (look up “mark-to-market” – it’s fascinating). If you have followed the recent troubles of Silicon Valley Bank you may have heard a reference to duration risk as the proximate cause of their woes. Simply speaking – they had customers wanting their deposits now, but those deposits were being held in a form that would only have the required value… then. And because they were not wearing their secret risk decoder glasses, they created a situation in which a huge, submerged risk became a huge, realized risk. And once again none of the “fixes” seem to deal with duration mismatch.

Unfortunately, duration mismatch is hard to see – the same way a forest is hard to see. This is especially true when all the incentives effectively encourage us not to see. The problem is that when we put on the secret risk decoder glasses, we see things that no one wants to see, things that overwhelm our ability to meaningfully respond. Fortunately, when it comes to tech debt, we understand the importance of diligently paying it back according to rigid, well established terms – he says as the phonograph needle of doom scrapes over the grooves of credulity. When do we ever pay back technical debt? Not quite never. But close… very close. And, eventually, about the time we hit a pothole and things unwind far worse than they should have, someone in the post-mortem puts on the glasses to have a closer look at the soft, soapy “glycerin” that should have greased the skids to all our dreams. They see another word in front of it, veiled by time, “nitro”. It has been there for years… and has backlog items referring to it containing words like “this needs to be revisited”. It may show up as the enervating drag we call software rot. Or it may show up suddenly when it becomes apparent that we needed the capabilities that were left on the cutting room floor (but are somehow still in all the marketing materials). The reality is we do not know when the debt will come due or in exactly what form. But I guarantee we know exactly when the next deadline is. This is what is so insidious about tech debt. It is so easy to not see it. And then you put on the glasses and cannot unsee it.

Well, that’s bleak. What can you do about it? If you read this and it resonated, you are going to find it hard to take the secret risk decoder glasses off. That, alone, is important. It puts you in a position to say the right words at the right time. The time might be a risk assessment. The words are “I don’t know”. Or, because we strive to be constructive and not just obstructive, we might say something like this:

“I don’t know what the upper bound on the impact of this risk is. I can provide a very confident assessment that holds true for the next 30 days. If the risk isn’t mitigated by then it must be assumed that it never will be. And beyond 30 days I can’t have much confidence that the assumptions that keep the risk contained will remain true.”

Maybe it’s not 30 days in your situation. You know your environment. This at least sets the stage for some fruitful discussions. Possible outcomes might include:

“Let’s just stop what we’re doing and address this now.” – highest credibility
“Deploy, but nothing is higher priority than fixing this until it is fixed.” – still good, but pics or it didn’t happen
“Put it in the backlog.” – negative credibility
“Are we saying fixing this now is our highest priority? Isn’t there tech debt in the backlog that we would legitimately be even more interested in extinguishing? What if we start measuring the tech debt present in the software and weight the priority of tech debt relative to feature work by the total amount of tech debt present?” – points for understanding the problem, but pics or it didn’t happen

Am I being a little hard-nosed? I have often thought it would be nice to make the code expire – to artificially add a known duration. Check in the greasy hack – in 60 days it will stop working. I expect it would be terrifying… for a while. We could construct technical and procedural mechanisms that would assign a duration to an item of value and then track that. But for any of it to work there must be the will to do it. The way to beat duration mismatch is not to play. Cut up the credit card (Outcome 1) and start paying down debt (Outcome 4). Having an engineering culture that is fanatical about keeping tech debt low is the only way it will not unwind. And, if you have a culture that is fanatical about keeping tech debt low, you probably won’t need a lot of sophisticated tracking to make sure your assets and liabilities resolve at the same time. Or, to put it bluntly, you won’t have to keep track of which lies you told to who. And, importantly, a commitment to low tech debt means you have borrowing power when you need it most. That means sometimes Outcome 2 will be a responsible possibility. Because over-constrained circumstances happen, and we keep our feathers numbered for just such an occasion.