There are few hard and fast rules in software development, but this one comes close: don’t copy and paste code. Keeping multiple copies around within the same code base is almost never a good idea.
Question is, do you have duplicate code in your code base? How much would you be willing to bet?
Recently I stumbled across a great tool called Simian. It scans through your code and reports any duplicate lines. It understands enough programming language syntax so that it can ignore comments, whitespace, curly braces, and variable names.
Simian hasn’t been updated for a couple of years now, but it works. It’s super easy to point it to your code and have it find duplicates. I encourage you to try it. Unfortunately, you may be in for an unpleasant surprise.
Friends don’t let friends copy & paste. That’s why you’ll want to set up this tool (or something similar ;) to run nightly against your code base to see if you’re getting better or worse. I’ve built a “total duplication” count and a “top 10 biggest chunks of duplicate code” reports, updated every night.
Related posts:
If you liked this, click here to receive new posts in a reader.
You should also follow me on Twitter here.
Comments on this entry are closed.
{ 5 comments }
For the past 2.5 years I’ve been distracted by family matters but I have some updates to Simian in the works :)
Simon, thanks – I know what you mean. I do hope you can find the time to keep Simian updated in the future. It’s a great tool.
Well thats true, I do find duplicate code all the time, but very afraid to take it off from production code sometimes…. if its too old. Not tried simian but will give it a shot
Working with legacy code is annoying. Eliminating duplication rarely bites you, though. What bites you is fixing a bug but forgetting to fix in the the other three places…
If you work a lot with legacy code, you may find Working Effectively with Legacy Code interesting.
See http://www.semanticdesigns.com/Products/Clone/ for a tool that detects duplicate code in spite of format changes, minor editing, inserted statements, etc,
using full language parsers to guide the analysis process. It works for a wide variety of languages. See website for sample reports of clone detection applied to different source applications in different languages.