Software Preventative Maintenance

Motivation

When you are actively working on new features in your software, it’s fairly easy to keep dependencies updated and make any framework changes as you go. Even if you have to break that work out into its own work item to do separately from the feature work, it’s often not that hard to find a place for it along with the work you’re already doing. Once your software is more mature, though, you still need to make time to check things out now and then so that the code supporting your products or services doesn’t get left behind. No, your code won’t necessarily collapse spontaneously (most of the time), but it would be pretty bad to let a known security vulnerability be taken advantage of or find out you need to do a lengthy refactoring in order to get some other urgent change released. Physical machines usually include a regular process called preventative maintenance to keep them running smoothly (for example, changing the oil in your car) – we need to do the equivalent for software.

There are ways to use technology to automate parts of this regular inspection and maintenance through static code analysis tools, but having a plan for how you handle the more involved updates will help you take on those changes at a time of your choosing rather than finding out about it at an inconvenient time.

The process sounds simple, in theory: make a list of what you have, make a list of what you care about checking, then take the cartesian product and fix everything that needs fixing 😀 In practice, there are some details to think about, so I’ll point out some tips along the way.

Making A List Of Services

For the most recent round of updates that I kicked off, we started from our list of repos in source control and filtered out one-off tools, anything that was no longer running in production, or anything that was about to be decomissioned. You could also look at a list of what services you have in production (ie. in your cloud provider portal) or in a service inventory tool like Backstage. If you only have one or two services, this might not be that challenging. If you like microservices, you might have a much longer list!

Making A List Of Criteria

Starting Point

At the place I work, we’re fortunate to have a fairly thorough company-wide best practices document available that has been created and improved by multiple contributors over time. The document gives development teams guidance on what a “good” service looks like for us with the intention that we should have all of the necessary documentation and other good practices (eg. runbooks, logging, distributed tracing, dashboards, etc.) in place before a service is released to production. If you don’t have a document like this yet, I’d suggest starting one and maybe base it on something available online. Depending on the maturity of your code and team, you may be able to take your best practices list and implement anything that’s missing in your first cycle of updates. If you can’t, I would suggest starting with a smaller selection of crucial items and adding more each time through. The intention here is that the review should result in a list of things that will actually get done and not just be added to the backlog to be done “someday”.

There is always something to be improved or updated and questing after perfection can mean that you never actually ship your improvements, so my suggestion would be to time-box your project and choose a period of time to allow for one cycle of updates. Again, if you can’t get to everything you want to do in the first block of time, expand your list on future cycles and continue raising the bar until all of your work is meeting the standard.

What Does It Mean?

Some of the items on your best practices list may not be entirely obvious just from the name. For example, if you have an item called “Dashboards”, how do you know if you have enough or the right type of data available in your dashboards? A more senior developer may be able to fill in the blanks on their own, but others may benefit from more direction around what is expected for each item. Our team had success in turning anything hazy into a few brief discussions to get alignment so that the people doing the checking will know what they’re looking for.

Checking Your Services

Once you have your list of services and the list of standards you want to meet, you’ll need to evaluate each service and decide whether it needs any work to get the desired outcome.

I suggest having a way of keeping track of who is evaluating which service and what the results of that review are. For this round, we used a work item per service to do the initial evaluation. This allowed each person to take pieces that fit into their available time. Each “failing” requirement that is found will be turned into its own work item as well. We continued to group these by service, but with sub-items for each necessary change so that each item can be worked on by a different person.

Executing

From here, it is “just” a matter of working through the list, bringing in enough work items from the previous step that the work keeps getting done without overriding your other priorities. This will require coordination with stakeholders like your product owner to make sure that the value of the work is understood and that these changes are properly prioritized and communicated to those who need to know.

The Whole Team Should Participate

I mentioned to my team that I had “multiple ulterior motives” for starting to explicitly organize this kind of work as a project. I’ll talk about a couple of those here.

Even Out Gaps In New Work

Sometimes the arrival of new feature work can slow as your client or other stakeholders need time to complete their parts of the process. In these cases, some people on the team might have time available to do work, but if you don’t have something available to give them, it can be difficult and disruptive to come up with an idea on the spot and hand the work over to them. By preparing a pool of work like this that can be pulled from during regular sprint planning as well as to fill unexpected gaps, you can save yourself the hassle of finding something in the backlog for someone to work on.

Trying New Things

The idea of keeping your services updated has value to customers, but it’s also an opportunity for the team to grow. It’s important to be able to define our technical standards as a team and normalize speaking up when we need to improve what we’re doing. Choosing what to update, evaluating code, identifying issues, writing bugs or user stories to fix what is found, and doing the implementation should all be spread across the team to get everyone experience with new and old services and with any of those parts of the process that they haven’t otherwise participated in. Telling your more junior developers that yes, you trust them to write up a user story to update a service can be a low-risk confidence booster for someone who is new to helping decide what your team’s output should look like. Team discussions or individual research (brought back to the rest of the team) can then be used to fill in any individual or collective gaps in knowledge or experience.

Your Lead Doesn’t Know Everything

Contrary to what some may believe, being a senior developer or lead doesn’t mean that you know all the answers about software development. I look things up on Stack Overflow or weird blogs as much as anyone else. Sometimes you don’t really know where you’re going to end up when you get started, but you still need to find a solution to a problem. Giving developers the opportunity and space to practice navigating towards those solutions is another good way to introduce newer developers to the idea that we can get to some kind of end result in uncertain situations. For more reading on this, I recommend the article Surviving the Organizational Side Quest which talks about navigating this kind of uncertainty internally within a larger organization.

Opportunities

As a relatively new lead, I would have assumed when I started a round of these updates with my team that they might have seen it as a boring slog, but the opposite seemed to be true. So far, people have been supportive and engaged and I hope to keep that going over time. Turning an obligation into an opportunity for growth seems to be the part that has worked here. We all have high standards and want to do a good job, and this kind of continuous improvement process can help provide a structure for those standards to actually become reality.