There is a lot of focus in DevOps on the tools used for delivery automation especially around configuration management (e.g. Ansible, Puppet, Chef) and Release Automation (CA-Noliosoft, IBM-UrbanCode). The goal of  these tools is delivery automation to lower Mean Time Between Failure (MTBF) due to deployment and configuration problems.

That doesn’t completely solve delivery related app availability issues since availability is defined as the ratio between uptime and operation time (including mean time to detect, mean time to repair and uptime). Just increasing MTBF – may not help with MTTR and doesn’t address the issues of MTTD at all. These tools can help with MTTR if you are willing to assume that you fix problems by redeployment. There are cases when that is enough – for example in a containerized environment where you can redeploy only the affected containers.

But if you think about it, DevOps automation can actually increase the time needed to diagnose a problem (the decompose step in IDEA). Before automation Ops had intimate knowledge of every detail associated with the deployment as they manually did the release.  Since the details of deployment are now masked by the tool, they won’t be able to find issues as quickly.

Production Assurance has a complementary focus to DevOps – lowering MTTR by lowering the time to detect, diagnose and repair deployment issues:

  • Shorter time to detect – using the fingerprint from staging, Production Assurance can proactively detect deployment problems.
  • Shorter time to diagnose and repair  – Production assurance provides details on where the deployment issues occurred and maps them to the fingerprint making them easier to repair.

So combining Configuration Management and Production Assurance increases MTBF, lowers MTTD and MTTR enabling IT to meet the true goal of increased app availability.