Last time in the Software Quality Defense in Depth series, we went over how effective planning helps you control quality risks early on. This is one of the earliest lines of defense in the battle against defects. This time, let’s look at one of the last lines of defense: Monitoring Software Quality via Error Tracking.

In this article we’ll look at the importance of error tracking solutions, their role in software quality, and how to use them effectively.

We’ll also touch on typical setup and monitoring and look at a few of the more popular error tracking services out there, but the focus of this article is more on the concept of error monitoring and its role in your development lifecycle.

The Problem

We spend a lot of time writing and testing software before releasing to production. How much time do we really spend monitoring and ensuring software quality once code enters production?

In my experience, organizations routinely release code into the wild with no proactive monitoring plan. Sure, we have support departments and ways for customers to report issues, but by that point it’s usually too late to correct problems.

Let me put it this way – if you have a critical problem with newly deployed code, do you want to learn about it from the help desk manager when support tickets are pouring in or do you want to know sooner?

Introducing Error Tracking

Error tracking software, unsurprisingly enough, exists to track application errors. Think of this software as a centralized error or logging collection service for events that occur in various places including:

  • Your Web Server
  • JavaScript Executed by Web Browsers
  • Mobile Applications
  • Desktop Applications
  • Automated console applications or services

By introducing a single solution to collect error information, you can monitor a wide variety of applications in one centralized place. These products are typically API-based with a management and monitoring user interface in the form of a web application and possibly accompanying mobile apps.

Error Collection & Triage

Once an error occurs, it is reported to the error tracking service which stores details about the error in its internal database. These will look for similar errors and automatically group errors together, which helps you track and prioritize individual problems.

 An Error tracking Screen in Raygun
An Error tracking Screen in Raygun

Tracking systems alert you in whatever ways you’ve configured them to – from E-Mails and Slack messages to even automatically creating new work items in Jira or other tracking systems.

The user interface will let you look at unresolved exceptions, get information on them, and see their frequency of occurrence as well as the date they were first introduced.

I will stress that in order to properly use an error tracking system, you must define a process for your organization to triage incoming errors. I recommend a rotation where engineers look at new errors on their assigned day or week and determine if further action is needed.

If you fail to properly track and triage items as they come in, they become noise and just part of the daily routine. Errors should never be acceptable or something okay to ignore.

Typically systems will let you assign, merge, and mark errors as resolved, which helps in the triage process. Generally systems will re-open them if they see the error recur, or re-open them if they recur on a newly released version of the application – if the system is configured to track your deployments.

Error Details

The wealth of information available in error tracking is really amazing and is a key reason to adopt a system like this.

Typically you get a wide range of data about the browser, operating system, device, and/or web request involved in the error (depending on the type of application the error occurred in, of course).

A specific instance of an error, logged in Raygun. Note the tabs at the top to get more details.
A specific instance of an error, logged in Raygun. Note the tabs at the top to get more details.

This extra detail can help you identify if something is impacting a broad range of users or only a specific one. It can also identify specific browsers that are encountering issues. This later case is extremely common in client-side JavaScript issues where not all browsers support all JavaScript features (typically a polyfill or shim is needed for these).

Usage Recommendations

I recommend that you adopt an error tracking solution in all applications that run regularly on production systems or systems heading to production.

I do recommend that you make your error logging managed in a single facade class wherever possible and have your other code call that facade. This allows you to change from one vendor to another or include global data more consistently.

I recommend that you make error logging configurable with a configuration flag to disable sending errors to the tracking system entirely and a configuration setting to store the API Key provided by the tracking system.

I also recommend that you keep error logging on in testing environments as regular usage can detect errors under the covers that may not be visible directly to testers.

For this reason, I recommend that reviewing logged errors from testing be considered before pushing any code to production.

Different Error Tracking Solutions

Okay, now that we’ve talked about what these solutions are and how they fit into your everyday flow, let’s look briefly at some of the major players out there.

  • Raygun – A high-end error tracking service with slick reporting and charting. The real value here comes when you add in the user tracking and application performance monitoring features to get a true picture of web site performance and behavior.
  • Rollbar – A web-based exception tracking service with a low-volume free tier of monthly usage. Rollbar supports a wide variety of languages and environments and gives you a lot of data out of the box.
  • Sentry – Another web-based exception tracking service. Some former coworkers of mine swear by this one.
  • Log Rocket – An up-and-coming error tracking solution offering some interesting error replay features for web based exceptions.
  • AirBrake – Another significant web-based service. I haven’t looked into them in detail yet.
  • OverOps – Another interesting tool I haven’t played with before. Has some nice looking dashboards and a very interesting performance trends feature allowing you to identify significantly slower methods.
  • New Relic – Technically more of an application performance monitoring tool, New Relic can also track errors. Typically though, you’ll need to move to one of the other options to get significant details on each error, however.
  • Cloud Specific Solutions (Azure Monitor, AWS Cloudwatch) – Baked into the cloud services you likely already use. This can be an easy way to adopt error monitoring for organizations already in the cloud.

I will be writing more on Raygun near the end of the year, so stay tuned if you want to get an in-depth-look at how to configure exception logging.

Recommendations

I recommend you look over the options above and figure out which one best suits your needs based on the languages you use and types of applications you deploy.

Give one or two of these tools a free trial and see what they tell you about your applications and processes.

I can tell you from experience that it sucks to flip the on switch for the first time and see everything wrong with your applications, but once you get past the initial resolution process, working with these tools can prevent defects from getting past preview or testing environments in the first place or give you the extra detail you needed to reproduce a tricky bug.

From a customer service perspective, having an error logged in the system already lets you require less data from a customer contacting you with a problem because you likely already have all the information you need and are working on a fix by the time they’ve gotten in touch with you.


Try it out and let me know what you think. If you know more about some of these solutions I mentioned but haven’t tried, or know of another I didn’t cover, let me know!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.