If Your Code Crashes in Production, Does it Make a Sound?

by Ville Laurikari on Wednesday, November 18, 2009

Post image for If Your Code Crashes in Production, Does it Make a Sound?

Meet Frank. Frank works as a product manager at Faxes-R-Us. He’s finishing up a presentation on their latest fax machine model: FRU-1221i with integrated Twitter and Facebook support. This one will sell like hot cakes.

Frank’s presentation is due tomorrow, and he’s now trying to include pictures on some of the slides. At Faxes-R-Us, they have a company policy to build all presentations using SlidePointX. This is where you come in: SlidePointX is your product. As Frank drags a picture of the FRU-1221i into SlidePointX, the program promptly crashes. No worries, mutters Frank and tries again. Another crash. Again. Crash. Again. Crash. Frank starts to panic. Again. It worked! Phew.

What do you think will be Frank’s next step? Will he reproduce the bug, maybe inspect some crash dumps, and write a nice detailed bug report and send it to you? No, I didn’t think so either. All Frank wanted is to finish his presentation, so that’s what he’s going to concentrate on. Your product just crashed four times in production use, and you will never hear about it.

Enter automatic error reporting.

Automatic error reporting, a.k.a. crash reporting is an application or technology which captures software crash data and sends it to you – at the user’s consent, of course. The particulars vary between different solutions, but typically you get access to crash dumps or at least stack traces.

Automatic error reports are invariably much more detailed than anything a normal user could ever give you. There are few things in life more frustrating than trying to reproduce a bug based on inadequate information. A full crash dump is pure bliss compared to “crashes when dragging pictures, please fix”.

But that’s not where the real power of automatic error reporting lies. Here’s what Microsoft has to say about Windows Error Reporting (WER):

Broad-based trend analysis of error reporting data shows that across all the issues that exist on the affected Windows platforms and the number of incidents received:

  • Fixing 20 percent of the top-reported bugs can solve 80 percent of customer issues.
  • Addressing 1 percent of the bugs would address 50 percent of the customer issues.

Let me repeat that last part in case you missed it: Addressing 1 percent of the bugs would address 50 percent of the customer issues. You can’t do that if you don’t know what the top bugs are. Automatic error reporting provides you with invaluable intelligence of the stability of your product across different versions and over time.

WER is built into Windows. If you ship applications on Windows, you need to start looking at your WER reports. I insist. To get going, the only cost for you is to get a code-signing certificate from VeriSign and you’re all set for WER. Don’t just sit there, go do it now!

On other platforms (Mac, Linux, commercial Unix), the sad state of affairs seems to be that there are typically no built-in tools that software vendors can use. If you want automatic error reports, you may have to implement it yourself. If you know about non-Windows solutions in this area, please let me know in the comments.

One of your goals should obviously be that there are no crashes in the field. But just in case there are problems, wouldn’t you like to see the full gory details?

Related posts:

  1. Platforms Come With a Culture
  2. Metric of the Month: Duplicate Code
  3. The Birth of the Grumpy Asshole Programmer

If you liked this, click here to receive new posts in a reader.
You should also follow me on Twitter here.

Comments on this entry are closed.

{ 3 comments }

Martin Klepsch May 11, 2010 at 15:48

I’m not sure if this is what you search but take a look for yourself:
http://code.google.com/p/google-breakpad

And: nice typography for a programmer’s blog

-Martin

Ville Laurikari May 11, 2010 at 15:55

Martin, many thanks for the link! Breakpad looks like it might be exactly what I was looking for. AIX and HP-UX support are missing, but the code looks clean and platform support should be straightforward to add.

Ville Laurikari May 12, 2010 at 21:44

Took a closer look at BreakPad. What I’d really like to see is a service which receives the dumps, does some analysis and statistics on them, and allows me to view that from a nice web interface. Would pay money for that. Now we’d have to build that system ourselves…

Previous post:

Next post: