Metric of the Month: Duplicate Code

by Ville Laurikari on Wednesday, February 10, 2010

Post image for Metric of the Month: Duplicate Code

There are few hard and fast rules in software development, but this one comes close: don’t copy and paste code. Keeping multiple copies around within the same code base is almost never a good idea.

Question is, do you have duplicate code in your code base? How much would you be willing to bet?

Recently I stumbled across a great tool called Simian. It scans through your code and reports any duplicate lines. It understands enough programming language syntax so that it can ignore comments, whitespace, curly braces, and variable names.

Simian hasn’t been updated for a couple of years now, but it works. It’s super easy to point it to your code and have it find duplicates. I encourage you to try it. Unfortunately, you may be in for an unpleasant surprise.

Friends don’t let friends copy & paste. That’s why you’ll want to set up this tool (or something similar ;) to run nightly against your code base to see if you’re getting better or worse. I’ve built a “total duplication” count and a “top 10 biggest chunks of duplicate code” reports, updated every night.

Related posts:

  1. If Your Code Crashes in Production, Does it Make a Sound?
  2. Get Rid of Source Code Templates

If you liked this, click here to receive new posts in a reader.
You should also follow me on Twitter here.

Comments on this entry are closed.

{ 5 comments }

Simon Harris February 10, 2010 at 13:14

For the past 2.5 years I’ve been distracted by family matters but I have some updates to Simian in the works :)

Ville Laurikari February 10, 2010 at 14:34

Simon, thanks – I know what you mean. I do hope you can find the time to keep Simian updated in the future. It’s a great tool.

kiran puttur March 3, 2010 at 07:55

Well thats true, I do find duplicate code all the time, but very afraid to take it off from production code sometimes…. if its too old. Not tried simian but will give it a shot

Ville Laurikari March 3, 2010 at 20:10

Working with legacy code is annoying. Eliminating duplication rarely bites you, though. What bites you is fixing a bug but forgetting to fix in the the other three places…

If you work a lot with legacy code, you may find Working Effectively with Legacy Code interesting.

Ira Baxter April 6, 2010 at 11:36

See http://www.semanticdesigns.com/Products/Clone/ for a tool that detects duplicate code in spite of format changes, minor editing, inserted statements, etc,
using full language parsers to guide the analysis process. It works for a wide variety of languages. See website for sample reports of clone detection applied to different source applications in different languages.

Previous post:

Next post: