Refactor to prevent duplication

Link to this posting

Postby Ursego » 21 Feb 2013, 21:05

Do your best to prevent code duplication. Extract duplicated code fragments to a brand-new function and call it from the locations the code has been moved from.

If duplicated fragments are in classes, inherited from a same ancestor, then create the new function in the ancestor and call it from the descendants. If the classes are not inherited from one ancestor then create a generic function in a third class (even if you have to create that class only for one function!). If you are working with a not object oriented language (like C or Transact-SQL), simply extract the code to to a brand new function / stored procedure.

There are two reasons to hate code duplication. Of course, you know them well (you are a good programmer, and these reasons are super-obvious for you). But, anyway, I will provide them here for the case this topic will be found by less experienced developers who cannot imagine their lives without copypaste.

1. If a change (like enhancement or bug fix) is made in one function, another function with the same fragment can be forgotten. In fact, the developer can even be unaware, that a duplication exists somewhere, burred in kilometers of code.

2. The overall length of code decreases, functions are kept smaller. Let me quote the topic Keep functions simple:

Divide your program into short methods that perform one identifiable task. The main function should call well-named sub-functions which call sub-sub-functions (and so on, and so on...), so a developer can easily "travel" up and down between different levels of abstraction (each time concentrating only on the current one) through hierarchies of any depth. That will dramatically improve understanding of complex systems and facilitate debugging.

What if the code fragments are very similar, but not absolutely identical?

Extract similar code fragments (and merge functions with almost identical functionality) into a brand-new smart generic function. Call it from the locations where the algorithm result is consumed, each time supplying the unique (location-specific) data.

In general, there are 3 ways to supply the unique data: as an argument, by returning from a dedicated function, and via an instance variable:

Supplying unique data as an argument.

The specific (different) data can be supplied as argument(s) of the new generic function - that is the most common and preferable way. For example, we have two original functions in two different classes (Car and Bike) which have a same ancestor (Transport):

*** BAD code: ***

printCarInfo() of the class Car:
Code: Select all
[fragment 1]
String transportDesc = "Car";
[fragment 2]

printBikeInfo() of the class Bike:
Code: Select all
[fragment 1] // same as [fragment 1] in printCarInfo() of class Car
String transportDesc = "Bike"; // oops, this line is different...
[fragment 2] // same as [fragment 2] in printCarInfo() of class Car

*** GOOD code: ***

printTransportInfo(), created in the ancestor class, Transport:
Code: Select all
[fragment 1] // same as [fragment 1] in printCarInfo() of class Car and printBikeInfo() of class Bike
String transportDesc = newTransportDesc; // newTransportDesc is the function's argument; "Car" or "Bike" can be passed
[fragment 2] // same as [fragment 2] in printCarInfo() of class Car and printBikeInfo() of class Bike

Supplying unique data by returning from a dedicated function.

In the next example, getTransportDesc() was created only for the sake of returning the circumstances-dependent data to one universal algorithm. Of course, in such a simple case as returning a hardcoded value, you should use the previous method (an argument), but sometimes the unique data is obtained or calculated in different ways for different classes, so a dedicated function can be a good solution:

** GOOD code: ***

getTransportDesc() implemented in the class Car:
Code: Select all
return "Car";

getTransportDesc() implemented in the class Bike:
Code: Select all
return "Bike";

printTransportInfo(), created in the ancestor class, Transport:
Code: Select all
[fragment 1] // same as [fragment 1] in printCarInfo() of class Car and printBikeInfo() of class Bike
String transportDesc = this.getTransportDesc();
[fragment 2] // same as [fragment 2] in printCarInfo() of class Car and printBikeInfo() of class Bike

If the original duplicated fragments were in classes, inherited from a same super-class, then create getTransportDesc() in that super-class as protected. Make it an abstract method if your programming language supports them. If doesn't (so you are forced to implement the function on the ancestor level, which doesn't make any sense) then throw an exception which will let developers know that the ancestor's version of the function must never be called since it's only intended to be overridden.

Supplying unique data via an instance variable.

Finally, there is one more way to achieve the goal - we can populate an instance variable while initializing the instance (for example, in its constructor) - of course, if the value is know in that moment and will not change later. In our example, we could declare an object field final String transportDesc;, populate it in the constructor of each descendant, and use directly in printTransportInfo() instead of the local variable with the same name. printTransportInfo() should validate the field and throw an exception if it's empty (i.e. has been forgotten to be populated in a descendant). This method is good only if the specific data is static by its nature (like an entity description), i.e. will not change after initialization. In other circumstances it can be confusing and bugs-prone, so it's better to use other methods.


So, keep in mind this simple rule:


In one of my projects the old problem of duplicated "almost same" code was so serious that I suggested to perform refactoring of entire application sub-system, and my suggestion was accepted - I spent a whole week to do that. As a result, my resume was decorated with the following beautiful fragment:

Proposed and performed code refactoring of existing corporative reporting system, moving duplicating functionality from descendant to ancestor classes. Facilitated maintenance and troubleshooting and made adding of new reports much easier and quicker.

But, of course, it's better to spend some time before initial coding and think how to organize classes instead of thoughtless straightforward coding which forces the keys Ctrl+C and Ctrl+V to over-heat.


If you find yourself thinking whether to duplicate code using copy-paste (10 minutes of work) or move it to a third place (2 hours including testing) - stop thinking immediately! NEVER THINK ABOUT THAT, even in last days of your contract with that company - SIMPLY TAKE THE CODE TO A THIRD PLACE and call it from where needed.

Nobody will ever complain that you acted according to the best practices, but many bad words can be addressed to you (by other developers and even by yourself!) when problems, caused by code duplication, will appear in the future (not necessarily bugs but even difficulties with understanding of program flow or debugging).

If you are still in doubt regarding spending your time (which really belongs to the company) - ask your manager. But it's better to do the work well and after that, if needed, to explain the manager why it has taken longer time: if the manager has the idea of high-quality programming then your effort will be appreciated!
User avatar
Site Admin
Posts: 120
Joined: 19 Feb 2013, 20:33

IF you want to ((lose weight) OR (have unbelievable (brain function AND mental clarity))) THEN click:

free counters

eXTReMe Tracker